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NOVEL OLIGONUCLEOTIDE ARRAYS AND THEIR USE FOR SORTING, 
ISOLATING, SEQUENCING, AND MANIPULATING NUCLEIC ACIDS 

Field ol : the Invention 

This invention is in the field of sorting, isolating r 
sequencing,, and manipulating nucleic acids « 

Background of the Invention 

Ordered arrays of oligonucleotides {"clicjos"} immobilised on 
a solid support have been proposed for sequencing DNA fragments. 
It has been recognised that hybridisation of a cloned single- 
stranded DHA fragment to all possible oligo probes of a given 
length can identify the corresponding, complementary oligo 
segments that are present somewhere in the fragment;, and that 
this information can sometimes be used to determine the DNA 
sequence. Use of arrays can greatly facilitate the surveying of 
a DNA fragment's oligo segments. 

In an oligonucleotide array each oligo probe is immobilized 
on a solid support at a different predetermined position. The 
array allows one to simultaneously survey all the oligo segments 
in a DNA fragment strand. Many copies of the strand are 
required, of course. Ideally, surveying is carried out under 
conditions to ensure that only perfectly matched hybrids will 
form, Oligo segments present in the strand can be identified by 
determining those positions in the array where hybridization 
occurs > The nucleotide sequence of the DNA sometimes can be . 
ascertained hy ordering the identified oligo segments in an 
overlapping fashion. For every identified oligo segment , there 
jxiust be another oligo segment whose sequence overlaps it by all 
but one nucleotide. The entire sequence of the DNA strand can be 
represented toy a series of overlapping oligos, each of equal 
length, and ascn located one nucleotide further along the 
sequence. As long as every overlap is unique, all of the iden- 
tified cligcs can be assembled into a contiguous ssguence block, 

There is an important limitation to sequencing by known 
surveying techniques. As relatively longer DNA strands are 
surveyed, there is an increasing probability that more than two 
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identified oligos will shares the same overlapping sequence, i.e., 
the overlap is not unique. When this occurs, the sequence of the 
DN& cannot foe unambiguously determined. Instead of one con- 
tiguous sequence block that contains the entire DN& sequence, the 
oligos can only he assembled into a number of smaller sequence 
blocks; whose order is not known. 

Summary of the Invention 

We have invented new oligonucleotide arrays and methods of 
using them* 

A "binary array" according to the invention contains 
immobilized oligos comprised of two sequence segments of prede- 
termined length,, one variable and the other constant. The 
constant segment is the same in every oligo of the array. The 
variable segments can vary both in sequence and length. Binary 
arrays have advantages compared with ordinary arrays? (1) they 
can be used to sort, strands according to their terminal sequen- 
ces , so that each strand binds to a fixed location (an address) 
within the array; (2) longer oligos can be used on an array of a 
given size,, thereby increasing the selectivity of hybridisation; 
this allows strands to fee sorted according to the identity of 
internal oligo segments adjacent to a particular constant 
sequence (such as a segment adjacent to a recognition site for a 
particular restriction endonuclease) , and this allows strands to 
toe surveyed for the presence of signature oligos that contain a 
constant segment in addition to a variable segment; {3} universal 
sequences, such as priming sites,, can be introduced into the 
terrain! of sorted strands using the binary arrays, thereby 
enabling tbe strands* specific amplification without synthesizing 
primers specific for each strand, and without knowledge of each, 
strand's terminal sequences? and (4) the specificity of hybrid- 
isation during surveying can toe increased .by coupling hybridisa- 
tion to a ligation event that discriminates against, terminal 
basspair mismatches » 

A "sectioned array" as used herein is one divided into 
sections, so that every individual area is mechanically separated 
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from all other areas, such as, for example, a depression on the 
surface, or a "well**. The areas have, different, oligos immobi- 
lised thereon. A sectioned array allows ssany reactions to foe 
performed simultaneously , both on the surface of the solid 
support and in solution, without fixing the products of different 
reactions. The. reactions occurring in different wells are highly 
specific due to the nucleotide sequence of the immobilised oligo* 
A large number of sortings and manipulations of nucleic acids can 
be carried out in parallel, by amplifying or modifying only those 
nucleic acids in each well that are perfectly hybridized to the 
im.mobili7.ed oligos, Hueleic acids prepared on a sectioned array 
can be transferred .to other arrays (replicated) by direct blot- 
ting of the wells' contents (printing) f without sailing the 
contents of different wells of the same array. Furthermore f the 
presence of individual sections in arrays allows multiple re- 
hybridisations of bound nucleic acids to be performed, resulting 
in a significant increase in hybridization specificity* it is 
particularly advantageous according to this invention, to use a 
binary array that is sectioned. 

Our invention includes methods of using sectioned arrays to 
sort fixtures ox nucleic acid strands,- either RHA or D&A* As 
used herein, "strand* 5 ssaans not just a single strand, but multi- 
ple copies thereof; and "mixture of strands'* means a mixture of 
copies ox different strands no matter how many copies of each are 
present. Similarly "fragment* refers to multiple copies thereof, 
and "mixture of fragments'* means a mixture of copies of different 
fragments. The methods include sorting strands either according 
to their terminal oligo segments (3 * "-terminal or 5 '-terminal} , or 
according to their .internal oligo segments on a binary array. 
Before or after sorting , universal pricing region (s) can foe added 
to the strands* termini to enable amplification, Binary sec- 
tioned arrays for sorting according to strands' terminal seguen-~ 
ces (** terminal sequence sorting arrays'* } can be comprehensive. A 
•* comprehensive array" is one wherein any possible strand will 
hybridise to at least one immobilized oligo. This type of 
sorting is particularly useful for preparing comprehensive 
libraries of fragments of a large genome. For example, in one 
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ambodiment of the invention, strands of restriction fragments 
have their restriction sites restored and are sorted on a binary 
array, That array contains Mobilised olxgos whose constant 
segments contain the sequence complementary to the restriction 
site, and an adjacent variable segment. The array is complete, * 
containing all variable sequences of each type in separate areas. 

Our invention also includes using sectioned arrays for 
preparing every possible partial copy of a strand or a group of 
strands. The terra "partial" refers to multiple copies thereof. 
Partials are prepared by either of the following methods: (1) 
terminal sorting on a binary sectioned array of a sisixture of all 
passible partial strands generated by random degradation of a 
parental strand; or (2) generation of partials directly on an 
array, through the sorting on an ordinary sectioned array of 
parental strands according to the identity of their internal 
oi.igo sequences, followed by the synthesis of partial, copies of 
each parental strand by enzymatic extension of tbe immobilised 
oligcs utilising the hybridised parental strands as templates* 
In either esse, generated partials correspond to a parental 
strand whose 3" or 5 f end is truncated to all possible extents 
(at the « variable* end of the partial) , and whose other end is 
preserved (at the "fixed" end of the partial) , These are «one~ 
sided partials.* Unless otherwise indicated the word "partial" 
is used herein to refer to one-sided partials. 

Our invention also includes methods of using oligo arrays to 
obtain oligo information as part ox a process for determining the 
nucleotide seguence of a long nucleic acid strand,, or of many 
nucleic acid strands in an unknown fixture. A complete set of 
one-sided partials of the strand or strands is prepared on a 
sectioned array, and the oligo content of the partial strands in 
each well of tbe array is separately surveyed (i.e. each group o£i 
partials sharing the same oligo at the partials 5 variable end is 
surveyed) . 

Our invention also includes methods of using oligo arrays 
for ordering previously sequenced fragments fron a first restric- 
tion digest of a large nucleic acid or even a genome. 
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Our invention also .includes methods of using oligo arrays 
for allocating seguenced and ordered allelic fragments into their 
chromosomal linkage groups. 

Dot invention also includes a method of using binary arrays 
for surveying the oligos contained in strands or their partials. 
This method provides improved comprehensive surveys over the 
conventional surveying of oligos on an ordinary array* 

Brief Description of the Drawings 

Figure 1 shows a binary array. 

Figure, la shows an oligo .r&m.obilised in an area of a binary 
array . 

Figure 2 shows a sectioned array having depressions. 

Figure 2a shows a well of a. sectioned array. 

Figure 3 shows addition of a lattice to a support to make a 
sectioned array. 

Figure 4 shows an example of sorting and amplification of 
restriction fragments on a sectioned binary array. 

Figure 5 shows an example of preparing partials on a sec- 
tioned ordinary array. 

Figure 6 shows, schematically the order of steps for 
sequencing a complete genome. 

Figure 7 shows, schematically s the use of a sheet with a 
number of miniature survey arrays for simultaneous surveying 
every well in a part ia ling array. 

Figures 8 to 11 show examples of the determination of 
nucleotide seguences from indexed address sets obtained from 
analysis of mixtures of strands. 

Detailed Description of the Invention 

1. Oligonucleotide arrays 

&s used, herein an "oligonucleotide array" is an array of 
regularly situated areas on a solid s\npport wherein different 
oligos are immobilised, typically by covalent linkage. Each area 
contains a different oligo whose location is predetermined. 
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Arrays can he classified by the composition of their 
immobilised o.ligos. ''Ordinary arrays" contain o.ligos comprised 
entirely of "variable segments". Every position of the oligo 
sequence in such a segment can be occupied by any one of the four 
eomsonly occurring nucleotides. * 

Comprehensive ordinary arrays are those wherein any segment 
of any possible strand will hybridise perfectly to the length of 
one or more immobilized oligos so that no strand is lost* 

Binary arrays differ from ordinary arrays, A binary array 
is illustrated, in Figures 1 and la. Figure X shows a substrate 
or support. 1 having immobilised thereon an array of oiigos 3, 
each oligo being in a separate area 2 of support 1. Figure la 
shows one area 2, A binary oligo 3 (many copies, of course) 
comprised of constant region 5 and variable r-sgion 6 is cova- 
lently bound to support 1 by eovalenf. linking moiety 4. 

Because of the constant segments, binary arrays provide 
means for the hybridisation of longer sequences without increas- 
ing the size of the array. The constant segment can be located 
within the irosohilised oligo either "upstream of the variable 
segment (i<.e<. f toward or at the 5 ( end of the oligo) or ,! down~ 
stream" from the variable segment (i.e., toward or at the 3* end 
of the oligo) , The type of array that is chosen depends on the 
specific application. The constant region preferably, is or 
includes a good pricing region for amplification of .hybridised 
strands by a polymerase chain reaction (PCR) , or a promoter for 
copying the strand by transcription » Generally a length of IS to 
25 nucleotides is suitable for priming, The constant region can 
contain all or part of the conpleiaent of a restriction site. A 
.binary array can be ,f piain* ! or 8 sectioned" (see below) „ 

"Plain arrays'* known in the art are arrays in ■which the 
individual areas are not physically separated frrss one another. ^ 
reactions carried out simultaneously are limited to those in 
which the nucleic acid templates and the reaction products are 
bound in some manner to the surface of the array to avoid the 
intermixing of products, 

"Sectioned arrays" are divided into sections, so that each 
area is physically separated by mechanical or other means 
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a gel) from all the other areas, e.g., depressions on the surf- 
ace, called a "well**. 'There are many techniques apparent to one 
skilled in the art for preventing the exchange of materials 
between areas; any such method can be need to make a "sectioned** 
array, as that term is used herein, even though there might not 
be a physical wall between areas. 

One type of sectioned array is Illustrated in Figures 2 and 
2a > Figure 2 shove a support sheet 60 having an array of depx-es~ 
sions or wells 62, each containing many copies ox an immobilized 
oligo 64. Figure 2a shows one well 62 of the array of Figure 2. 
Well 62 forced in support 60 has therein oligo 64 covalentl.y 
bound to support SO by covalent linking moiety 66. In practice 
one may prepare a plain array, e»g* , on a flat sheet, and then,, 
at a point during a series o£ steps involving its use, convert 
the array into a sectioned array, e„g. f by making physical 
depressions in a deferrable solid support to isolate the 
individual areas. The sectioned array can also be created by 
applying a lattice to the solid support and bonding it to the 
surface so that each area is surrounded by impermeable wails. An 
exploded perspective view of such a sectioned array is shown in 
Figure 3. Support or substrate 70 „ here a planar sheet, has 
counted thereon and affixed thereto a lattice 72 comprised of a 
series of horizontal members 74, 76. The lattice members define 
a series of open areas which, in conjunction with support 70, 
define an array of wells ?S> In some applications it is prefer- 
able to utilise a detachable lattice (or a removables cover 
sheet) so that the sectioned array can be converted back to a 
plain array. 

Sectioned arrays according to this invention can be used to 
increase the specificity of hybridisation of nucleic acids to the 
immobilized oligos. &fter hybridisation, unhybridized strands 
can be washed away. Hybridised strands can then be released into 
solution without mixing. Released strands can be rebound to the 
immobilized oligos., and tinhybridized strands can be washed away. 
Each successive release, rsbinding f and washing increases the 
ratio of perfectly matched hybrids to mismatched hybrids. 
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Mi array cats be "3 iH or *5**' . w 3* arrays*' possess free 3* 
termini and "5 ! arrays 48 possess .tree 5' termini. The immobilized 
oligos in a 3 s array can foe extended at their 3* termini by 
incubation with a nucleic acid polymerase. If it is a template- 
direeted polymerase, only immobilised oligos hybridised to a 
template strand can be extended. 

Methods of oligodeoxy ribonucleotide synthesis directly on a 
solid support are also known in the art, including methods 
wherein sy.nt.hes is occurs is the 3* to 5* direction (so that the 
oligos will possess tree 5 s termini.}* Methods wherein synthesis 
occurs in the 5* to 3 s direction (so that the oligos will possess 
free 3* termini} are aXso known. 

Suitable substrates or supports for arrays should foe non- 
reactive with reagents to be used in processing, washable under 
stringent conditions, not interfere with hybridisation and not toe 
subject to inordinate non-specific .binding. For example, treated 
glass poXyraers of various kinds (e.g., poly amide and poXyacromor- 
phoiide) , r latex-coated substrates and silica chips. 

Arrays can be made over a wide range of sizes. In the 
example of a square sheet, the length of a side can vary from a 
few millimeters to several meters. 

II, Sorting nucleic acids 

Our invention allows fixtures of strands to foe sorted 
according either to their terminal oiigo segments {"'terminal 
sorting* 5 ) or their internal oligo segments ( 5S internal sorting" } 
on a. binary array . 

There are two important aspects of our invention for sort- 
ing. First, each strand in a fixture can be made to hybridise at 
only a few, or a single , location. And second, each strand can 
be provided with universal terminal priming regions that enable 
PCR amplication "without prior knowledge of the terminal nucleo^ 
tide sequences and without the need to synthesize individual 
primers . 

For terminal sorting, the priming region (s) can foe made 
essentially dissimilar from the sequences occurring in the 
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nucleic acids that are present In the mixture to be sorted, so 
that priming doss not occur anywhere but at the strands* termini, 
When strands from a couplets restriction digest of a DNA are to 
be terminally sorted and amplified, priming only at the strands 
termini cax^ fee promoted by restoring the terminal restriction 
sites (those sites having been eliminated from internal regions 
by complete digestion) concomitant with the generation of 
terminal priming regions .. 

Terminal sorting is carried out on a binary array, which 
preferably is sectioned. The immobilised oligos contain a 
constant segment complementary to either the strands* 3" pricing 
region or 5* priming region* Thus, each strand can only he 
hybridised to one location within the array. By sorting on a 
comprehensive array, every strand is bound somewhere within the 
array- This Is especially Important, for the px-eparation of a 
comprehensive library of fragments of a long nucleic acid or a 
genome * 

Strands can be sorted on either 3' or 5 s arrays in which the 
constant segment is located either upstream or downstream of the 
variable segment « High specificity of sorting can be achieved by 
employing 3 s arrays in which the constant segment of the immobi- 
lised oligos is np.strea.an in that case, sorting can he followed 
by the generation of an immobilized copy of each sorted strand 
using the immobilized oligos as primers for the synthesis of a 
complementary copy of that strand when the array is incubated 
with an appropriate DH& polymerase. The generation of copies 
covalently linked to the array enables the array to be vigorously 
washed to remove non-covalently bound material before strand 
amplification. It also enables the arrays to serve as permanent 
banks of sorted strands which can subsequently he amplified over 
and over to generate copies for further use* 

A strand sorting procedure is shown in Figure 4. A DJ5& 
sample 10 is completely digested with a restriction endonuclease. 
The ends of each fragment are restored, and universal printing 
sequences 1? generated in the process to prepare fragments 11 for 
sorting. It is not necessary that priming sequences he added at 
both ends, if only linear amplification is desired. Her Is it 
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necessary that the pricing sequence at the 3 * end of a strand be 
the same as the priming sequence at the 5* end. 

The strands are than melted apart 12 and hybridised to a 
terminal sequence binary sorting array, whose immobilized oligos 
14 contain a variable saggars t 15 and a constant segment 16 which 
is complementary to the universal priming region 17, including 
the restored recognition sits of the restriction enryme 16a, 17a, 
Each strand is at a location dependent upon its- variable sequence 
100 adjacent- to its priming sequence. At this point the array 
need not be sectioned. The array is then washed to remove 
unhyforidised strands* The entire array is then incubated with 
DNA polymerase. Consequently, a complementary copy IS of each 
hybridised D&A strand is generated by extension of the 3* end of 
the oligo to which the strand is bound. The array is then 
vigorously washed to remove the oriqinal D&A strands and all 
other material not. eovalently bound to the surface (not shown) * 

The eovalently bound copy strands can he amplified. During 
amplification it is usually desirable that the array be sec- 
tioned. The wells are filled with a solution containing univer- 
sal primers 19, 20,. an appropriate SNA polymerase, and the 
substrates and buffer needed to carry out PCR, The array can,, if 
desired; be' sealed with a cover sh«et, further isolating the wells 
from each other, PCR is carried out simultaneously in each well 
of the array x This results in sorting the mixture of strands 
into groups of strands that share the saxae terminal oligo 
sequence, each strand (or each group oi strands) being present in 
a different well of the array and amplified there* 

The results of hybridisation can be improved by "proof- 
reading", or editing, the hybrids formed, by selectively destroy- 
ing those hybrids that contain mismatches, without affecting- 
perfect hybrids > 

The length of the immobilised oligos in a strand sorting 
array is chosen to suit the number of strands to be sorted. When 
sorting strands according to their terminal sequences, the nmaher 
of different strands obtained in each well equals the number of 
times that a particular oligo complementary to the variable 
segment of the immobilised oligo occurs among the termini of 
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dif.fers.nt strands in the fixture. If the number of nucleotides 
in each variable segmeut is n, then the total number of such 
variable sequences is 4% and the mesm .number of different 
strands in a well is N/4", where K is the nuisber of different 
strands in the mixture, provided that nucleotide sequence is 
random, and that each of the four nuc leotides is present ir» equal 
proportion. If a random sequence that is the sisse of an entire 
diploid human genome (6 x ID* hasepairs) is completely digested 
by a restriction endonnelease that has a ha&asserio recognition 
site, then the resulting mixture will contain approximately 
3 x 10* strands with an average length of 4, OSS nucleotides. If 
this mixture is then applied to a comprehensive binary array 
having variable segments eight nucleotides long, then each well 
will contain, on average, approximately 45 different strands. 

Our invention also includes methods for isolating individual 
strands by sorting them according to the identity of their 
ter.mina.1 secfaences on sectioned binary arrays* The strands can 
be from restriction fragments or not, so long as nniqiie priming 
sequences are added to at least one of the strand's termini, such 
as by methods described herein. If the number of different 
strands in a sample is rather small,, there is a high probability 
that after the first stage of sorting , many wells will either not 
be occupied, or be occupied by only one type of fragment, in the 
case of a complex mixture of strands (such as from the digestion 
of an entire human genosse) , a number of different types of 
fragments will occupy each well. In that case, the isolation of 
individual fragments can fee achieved by PCR amplifying the 
strands in each well in the first stage of sorting and then 
sorting the group of fragments from each well on a fresh sec- 
tioned array* After symmetric PCR amplification, each veil of 
the first array will contain copies of the strands that were 
originally hybridized there, and also their complementary copies* 

If the original strands were sorted by their 3' ends, then 
their copies in a given well will all possess the same 
3 * -terminal sequence, and their complementary copies will possess 
the same 5* end. However, the 3 f -terminal sequences of the 
complementary copies of the original strands in each well will be 
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different, (as will be the 5' terminal sequences of the original 
copies}* therefore , the complementary strands will bind at 
different locations within the new sectioned array, according to 
the identity of their own 3 » -terminal sequences, and with a high 
probability, each of them will occupy a separate wall, where they 
can then be amplified. 

Alternatively, the second stage of sorting can he carried 
out according to the identity of the terminal sequences at the 
other end of each strand. For example, if the strands vers 
sorted in the first stage, by their 3 « ends (on an array whose 
immobilised oiigos contain upstream constant segments,, then the 
groups of strands from each well in the first array can be sorted 
in a second stage by their. 5* termini (on an array having down- 
stream constant segments). In either procedure, as a result of' 
the second round of sorting , almost all of the different types of 
fragments are separ-ated from one another (with the exception of 
virtually identical allelic strands from a diploid genome, which 
usually have identical termini, and consequently are sorted into 
the same well) . The isolated strands can then he used for any 
purpose. For example, they can he inserted into vectors and 
cloned , or they can be amplified and their sequences determined. 

Our invention also includes the use of .binary arrays for 
isolating selected strands by sorting according to the identity 
of terminal sequences. Strands can,, for exar.pie f foe selected 
that contain particular regions (such as genes) of special 
interest from a clinical viewpoint. After the relevant portion 
of a genotae has been sequenced, an array can he made using only- 
preselected oligos whose variable segments uniquely match the 
terminal sequences of the strands ox interest, i,e.< they would 
he long enough to uniquely hybridise to the desired strands. 

Our invention also encompasses methods that include sorting 
fragments according to their internal sequences. When so sort- 
ing, strands may bind at more than one well. This type of 
sorting can he useful for a number of applications, such as the 
isolation of strands that contain particular internal sequence 
segments (utilising a sectioned ordinary array) , or the sorting 
of strands according to the identity of variable oligo segments 
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adjacent to internal restriction sites of a particular type 
{utilising a sectioned binary array) - The latter approach is 
useful for ordering sequenced restriction fragments . The sorting 
of strands by tbeir internal segments on a 3 s sectioned ordinary 
array is useful tor the generation of partial strands by virtue 
of extension of the Immobilised oligos* 

Our invention includes the sorting., in particular for 
sequencing, of natural mixtures of molecules, such as 
cellular HH&s. Establishing messenger RMA sequences is useful, 
not only for the identification and localisation of genes in the 
genomic DIN A, b\st also for providing information necessary to 
determine the coding gene sequences {i»e> the exon/incron struc-~ 
tare of each gene) . Furthermore , the analysis of cellular S?&&s 
in different tissues, at different stages of development, and in 
the course of a disease, will clarify which genes are active* 
Usually, RSAs are short enough to foe sorted and analysed without 
preliminary fragmentation. 

III. Preparing partial strands of nucleic acids on sectioned 
arrays 

Our invention includes methods of using sectioned arrays for 
preparing all possible partial copies of a strand or a group of 
strands. Preparing complete sets of partials of a strand <s) , and 
sorting the partials toy tbeir variable ends is especially useful 
in a process for determining the sequence of the strand or 
strands. The. preparation of partials is accomplished by either 
of the following methods; {1} terminally sorting on sectioned 
binary arrays a mixture of partial strands generated by degrada- 
tion of a "parental'" strand (a) at ranoom; or {2} generating 
partials on a sectioned ordinary array, through the sorting of a 
parental strand (s) according to the identity of the strand's 
internal sequences, followed by the synthesis of (complementary) 
partial copies of the parental strand (s) by the enzymatic exten- 
sion of the immobilized o.ligos, utilizing the hybridised parental 
strands as templates, and then copying the istoactoilized partials. 



WO 93/1-7126 



PCT/fjS93/01S52 



-X4-- 

By using comprehensive arrays, it is possible to prepare every 
possible one-sided partial of a strand. 

In -the first case (part taxing .before sorting) f a strand; or 
a double-stranded fragment, or a group of either, carrying 
terminal priming regions,, (these can be a strand or a group of 
strands sorted on a sectioned binary array as described above) , 
Is randomly degraded by a chemical or an enzymatic method f or by 
a combination . of both. Then the saixture of parti a Is is sorted on 
a sectioned binary array according to the identity of their newly 
generated termini f essentially as described above for the sorting 
of full-length strands by their terminal sequences,, with new 
priming sites being introduced at these new termini either before 
or after sorting. Only those partials that possess both the 
newly introduced priming site and the already existing pricing 
site (at. the opposite end) , will be amplified by subsequent PCRx 
Partials can be sorted according to the identity of a variable 
sequence a t either their 3 * termini or their 5 * termini * 
However ( as is the case for the sorting of full-length strands, 
the highest specificity can be achieved by sorting according to 
the identity of a variable sequence at the 3 f termini/ and 
carrying cut the sorting on 3* arrays having upstream constant 
segments, or by sorting according to the identity of a variable 
sequence at the S s termini, and carrying out the sorting on 5* 
arrays having downstream constant segments. In these cases, 
sorting can be followed by the generation of immobilized 
(complementary) copies of the sorted partials. The arrays with 
the immobilized copies can serve as permanent banks of the sorted 
partials which can subsequently be amplified over and over to 
generate copies for further use. Following sorting, each well in 
the array will contain immobilized copies of all of those 
partials whose variable end is complementary to the variable 
segment of the immobilized oligc The other (fixed) end. of these 
partials will be identical to one of the ends of the parental 
strands. If an oligo segment occurs store than once in a strand, 
or if It occurs in mors than one strand in the group of strands 
subjected to partial ing, then the well will contain a 



W0 9.V17126 



corresponding nu&ber of different partials, ail sharing the same 
sequence at. their variable ends. 

In the second casa (sorting bet ors partialing) f partials are 
prepared directly from the parental, strands that are hybridised 
to a sectioned ordinary array without prior degradation. A 
strax^d, or a mixture of strands is hybridised to a 3" ordinary 
array* The im&ohili£ed oligos are then used as primers for 
copying the hybridised strands, beginning at the location within 
each hound strand where hybridisation occurred, and ending at the 
upstream tertaimis of each bound strand. After extension of the 
immobilised oligos, the hybridised parental strands are dis- 
carded* At this point the wells contain immobilized { comp lessen »- 
tary) partial strands. The partials in one well all share a 
5 * -terminal oligo segment that is complementary to a particular 
internal oligo in the parental strand (s) . The partial strands 
have 3 * -terminal sequences that include the complement of the 5*~ 
;.er&i; reo >n of the parental strand <s) (which contains a 
jeiMii:<3 regi a) * Unlike the methods described above for partial- 
ing before sorting, the immobilised complementary partials will 
contain a priming region at only one end and therefore can not be 
amplified exponentially. However, their linear amplification is 
possible , with the partials being synthesized as DK&s or F03As. 
Where RNA partials are generated, the priming region at the 
partial copy's 3* terminus contains an KNA polymerase promoter. 
Synthesis of &NA copies is sore efficient than linear synthesis 
of DNA copies. Alternatively , the synthesized copies can fee 
provided with second priming regions and can then be amplified in 
an exponential, ssr.ner foy PCP, This approach is illustrated, 
schematically, in Figure 5. 

Figure S illustrates the generation of partials for one DK& 
parental strand 3 0 on a 3 s sectioned ordinary array* First, the 
strand 30 (many copies, of course) such as obtained from well 13a 
of sorting array 13, is hybridised to the partialing array 31, a 
3> sectioned ordinary array, containing well 31a* The pax-ental 
strand 30 binds to many different locations within the array, 
dependent on which oligo segments are present in the strand. A 
hybrid 3.*> is formed in each veil at the array that contains an 
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immobilized oligo complementary to a strand's oligo segment:. 
After hybridisation , the. entire, array is washed and incubated 
with an appropriate DNA polymerase in order to extend the isa~ 
3s0fo.ili.sied oiigos utilizing the hybridized strand as a template. 
Each extension product 33 strand is a partial { complementary) 
copy of the parental strand. Bach partial, begins at the place 32 
in the strand where hybridisation occurred and ends at the 
strand's terminus. The strand preferably terminates at its 5* 
terminus with a universal priming sequence 17, such as one 
introduced into all strands when sorting strands on a sectioned 
binary array as described. This allows for amplification of the 
partials. That pricing sequence can contain a restored restric- 
tion site 16a. The parental strand may also contain, if it v?as 
previously sorted on a binary sorting array, a pricing sequence 
at its 3 * terminus 1? ., adjacent to the variable sequence 100 that 
the strand was previously sorted by. 

The entire array is then vigorously washed, under conditions 
that remove the parental DNA strands and other materia.!, prefer- 
ably all, that is not covalexvtiy nound to the surface. The areas 
of the array then contain immobilized strands 33 that are com- 
plementary to a portion of the parental strand. The wells can 
then be filled with a solution containing the universal primer 
(or promoter complement),, an appropriate polymerase t and the 
substrates and buffer needed to carry out multiple rounds of 
copying of the immobilized partial strands. The array can then 
he sealed, isolating the wells from each other, and (linear) 
copying can be carried out simultaneously in all of the wells in 
the array* 

XV. Surveying oligonucleotides with binary arrays < 

Our invention includes using binary arrays to survey oiigos 
contained in strands and partiais. Binary arrays allow surveying 
to be improved as compared with ordinary arrays, and they allow 
new types of selective surveying (such as surveying "signature 
ol igomuci ecstides * f ) , 
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Xn surveying, strands first can be randomly degraded into 
pieces whose average length slightly exceeds the surveyed length. 
?i.fter degradation, each resulting nucleic &ci& piece is iigated 5 
to the same type of oil go (i.e., a constant sequence),, that 
preferably does not occur anywhere in the internal regions of the 
pieces* For example, the seguence of the added oligo can contain 
the recognition site of a restriction endomsclease that was used 
to digest the DN& prior to fragment sorting* The ligation can be 
carried out in solution prior to hybridisation, or after hybridi~ 
zation of the pieces to binary immobilised oligos whose constant 
segment is complementary to the oligo to be ligated* Preferably, 
a 3* array is used, having upstream constant segments. The 
immobilized oligos can then he extended with an appropriate DNA 
poly?. - rase, using the hybridised nucleic acid pieces as 
templates. It is preferable that after extension all hybrids 
have the same length. This can be achieved by employing dideoxy- 
nucleotides as substrates for the polymerase, to restrict exten- 
sion to one nucleotide, 

Hybrids can be labeled in both a ligation-dependent and an 
extension-dependent .manner to increase the specificity of hybrid 
detection* Also, the ligated oligos and the added dideoxy~ 
nucleotides can be tagged with different labels, for exajapl®, 
fluorescent dyes of different colors. Tbe array is then scanned 
at two different wavelengths,, and only those areas that esi.t 
fluorescence of both colors indicate perfect hybrids. 

Survey results can he improved further by hybrid proof- 
reading, by destroying hybrids containing mismatches, and by 
using chemical or enzymatic methods, 

V, Use of the oligonucleotide arrays for the seguencing of 
nucleic acids 

The arrays and methods of this invention can be used to 
determine the nucleotide seguenee of nucleic acids, including the 
sequence of an entire genosse, whether it is haploid or diploid * 
This embodiment requires neither cloning of fragments nor prelim- 
inary mapping of chromosomes » it is especially significant that 
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our method avoids cloning,, a labor- intensive and time-consuming 
approach that Is essentially a random search for fragments. In a 
preferred embodiment a. comprehensive collection of whole nucleic 
acids or fragments is sorted into discrate groups. The sorted 
nucleic acids are then amplified with a. polymerase, preferably by 
FCR. 

Sequencing large diploid genomes,, such as a human genome, 
using the arrays and methods of this invention is shown in Figure 
6* Wo will describe the overall method in general terms. In the 
embodiment illustrated in Figure S an individual *s genomic 40 
is digested with a restriction endonuclease and sorted by ter- 
minal sequences into groups of strands using a 3* sectioned 
binary sorting array 13, as is described above in Section II and 
illustrated in Figure 4. 

Hext. f treating each well 13a of the sorting array separ- 
ately, a complete set of partials is prepared for each group of 
sorted strands using a sectioned array 31, as is described above 
in Section 111 and illustrated in Figure 5. The partials can foe 
generated in any chosen manner to make them detectable. 

Then the contents of each well 3 la of the partialing array 
31 is surveyed using a survey array 42 t as is described above in 
Section IV. Preferably -the survey array is a binary array, but 
an ordinary array may be used. In the e^focsdissent shown in Figure 
S, surveying is performed with a sheet 43 containing miniature 
survey arrays 42 that have been printed in a pattern that coin- 
cides with the number and location, of the wells 31a. The oligo 
information obtained can be used, according to our invention, to 
separately determine the nucleotide sequence of every strand in 
each group isolated on the sorting arr-ay* 

To determine the order of the fragments sequenced as il- 
lustrated in the embodiment of Figure 6, genomic DNA 4-0 is 
digested with at least a second- restriction endonuclease and. 
sorted into groups of strands using a 3 ! sectioned binary sorting 
array 44, as is described above in Section XI and. illustrated in 
Figure A. The contents of. each well 44a of the sorting array 44 
is surveyed with special survey arrays 45, 46 that identify 
"signature oligonucleotides" (described below) in intersite 
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*eg»ssts ©f sorted fragawants from different digests. This is 
dona to determine the order of the fragments relative to one 
another without regard to differences between allelic pairs of 
fragments. In the embodiment shewn in Figure 6 this surveying is 
performed with printed sheets 47, 4B that have been printed with 
a pattern of miniature arrays 45, 4 6. 

To allocate the ordered allelic fragments to their respec- 
tive chrossososes in a diploid organism , fragments are linked 
according to their allelic differences. In the emfoodi&ent 
illustrated in Figure s, the strands frcm selected wells of the 
sorting array 44 are transferred to a selected well of one of a 
series of partialing arrays 49, partials are generated, and the 
partials are surveyed using Ts.lniat.ure survey arrays 50 on printed 
sheets 51. Only the presence of oiigos containing allelic 
differences in the selected partials needs to be determined to 
link a pair of allelic fragments to their respective neighboring 
allelic fragments ♦ 

When sorting according to the identity of terminal sequen~* 
ces, each strand occupies a particular ^address* 8 in the array. 
It is convenient to think of the address as the oligo sequence 
within a strand that directs the strand to hybridise to a 

particular location, i.e., the sequence that is perfectly com- 
plementary to the variable sequence of the oligo immobilised at 
that location. The ^address* 5 also identifies the location within 
the array where the DMA hinds. 

After sorting, each group of strands is amplified and 
subjected to partialing. Importantly, the isolation of 
individual strands is not necessary, because our method allows 
the nucleotide sequence of each strand in a mixture to he deter- 
mined* In particular, our method allows the sequences of strands 
in a well of the sorting array to toe determined, separately from 
mixtures of strands in other wells. In a preferred embodiment, 
the partialing array is comprehensive in order to obtain all 
possible one-sided partials (i.e., a comprehensive ax-ray) * Each 
grorsp of partials is amplified prior to surveying, Most prefer- 
ably, the amplification is carried out in such a manner that one 
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of the two complementary partial strands is produced in great 
axcsss over the other. 

Each group of partials is surveyed to identify their con- 
stituent oligos* Surveying is preferably carried out using 
binary arrays* 

Although not necessary , it is preferable to have the survey 
arrays be as compact as possible. It is anticipated that survey- 
ing will be advantageously accomplished simultaneously for many 
or all veils of a partlaling array by utilising a sheet on which 
miniature survey arrays have been Sprinted" in a pattern that 
coincides -with the arrangement of wells in the partlaling array, 
in a manner similar to that shown in Figures 6 and 7» Referring 
to Figure 7, partlaling array 31, comprising an array of wells 
3 la, is surveyed using sheet 43, having printed thereon an. array 
of miniattsrizad survey arrays 42. The pattern, of arrays 42 
corresponds to the pattern ot wells 31a, whereby ail wells 31a 
call fee surveyed simultaneously. 

Automated photolithography techniques for preparing minia- 
ture oligo arrays have keen developed fFodor, S. P., "Read, J . L» f 
Pirrung, M, C, Stryer, L M Lu, A. T. and Solas, D. (1991). 
Light-Directed, Spatially Addressable Parallel Chemical 
Synthesis, Science 251, 767-773 3., The manufacture of miniature 
arrays on a "chip'*, for use in surveys also has been reported. 

Surveying with comprehensive arrays produces a complete list 
of .oligos contained in the partials in each well of the partial- 
ing array. This will reveal all oligos present in all partials 
in that well. The isethod of this invention can determine the 
sequences of the original (parental) fragment strands, 

The "partials** .referred to in this section are one-sided 
partial strands that begin at the 5 5 terminus of a parental 
nucleic acid strand (the fixed end) and end at different nucleo- 
tide positions in the strand (the variable end) . Partials are 
sorted in the. partlaling array according to the identity of their 
variable ends, and therefore each partial has a particular 
"address" within the array, As with sorting arrays, an .« address" 
in a partlaling array is the oligo sequence that is present at 
the variable end of the partial strand and that is complementary 
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to the variable segment of an immobilized oligo. The « address" 
also relates to the location within the array where the partial 
strand is found, since the variable segment of the oligo immobi- 
lized in that well is complementary to the oligo at the partial'® 
variable ters&inus* The * address " also relates to the location 
within the parental strand of a partial 's terminal oligo* The 
location of this "address oligo* 5 within a parental strand is 
characterized by an ^upstream subset" of oligos that cois before 
it in the parental sequence and .by a "downstream subset « of 
oligos that come after it. 

Our method of establishing .nucleic acid sequences, for 
either a single strand or a group of parental strands sorted by 
their terminal sequences, begins by assembling an "address set" 
fox: each address in the partialing array. The « address set* ! is a 
comprehensive list of all oligos in all the parental strands 
which have the address oligo within their nucleotide sequences. 
The "upstream subset 5 * contains ail the oligos that occur upstream 
(i.e., towards the 5* end) of the address oligo in parental 
strands that contain the address oligo. The "downstream subset** 
contains all the oligos that occur downstream (i. e> , towards the 
3 f end) of the address oligo in any parental strands that contain 
the address oligo. Together the two subsets form the * address 
set x ** 

The upstream subset of each address can he determined 
directly from the survey of eaoh well of a partialing array and 
consists of a list of all the oligos identified as being present 
in the partial strands in that well. The downstream subset of 
each address can be inferred by examining the upstream subsets of 
all the addresses; the downstream subset of a particular address 
consists of those addresses whose own upstream subset includes 
that particular address oligo. 

The upstream subset and the downstream subset of a par- 
ticular address, taken together, are an '» indexed address set*** 
If an oligo occurs Bare than ones in a strand, it can occur in 
both the upstream and the downstream subsets of an address. 
Indexed address sets provide the information required to order 
the oligos contained in a strand set, as will be described below. 
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When a mixture of strands is examined, it is also usei'nl to 
consider an address set without regard, to which oligos occur 
upstream and downstream of an address. This is called an 
"unlndexed address set**, Unindexed address sets are decomposable 
into strand sets by the method of this invention. 

We have discovered that whan assembling big strand sets 
whose oligos &o not all overlap uniquely,, it is advantageous to 
work with "sequence blocks" rather than with individual oligos* 
Sequence blocks are composed of oligos that uniquely overlap one 
another in a given strand set. Two oligos contained in a strand 
set are said to overlap if they share a terminal (5* or 3 f ) xi~l 
imcleotide sequence. An overlap is unique if bo other oligo than 
those two in the strand set has this sequence at its termini , 
Here n is the length (in nucleotides) of each of the two oligos 
if they are of the same length or, if they are of different 
length, n is the length of the shorter one. We ttse unique 
overlaps to construct sequence blocks £>~om the oligos in a strand 
set* 

The position of each sequence block relative to the others 
is determined from the distribution of the oligos between the 
upstream and downstream subsets of every address. This is 
accomplished fey finding, for each of the blocks ^ which blocks 
occur upstream, and which blocks occur downstream, of that block 
by examining the address sets. The address sets are -used in 
order to generate "block sets . iS The block sets are address sets 
wherein blocks have been substituted for the oligos that comprise 
the blocks, including the address oligo. Once the relative 
position of the sequence blocks has been determined,, they can be 
assembled into the final sequence » The assembly is governed by 
the following rules: (1} each of the blocks must be used at 
least once, (2) the clocks ^sust be assembled into a single 
sequence, (3) the ends of neighboring blocks must match each 
other (i»e», overlap by an n~l nucleotide sequence , see above) 
and (4) the order of the blocks must be consistent with their 
positions relative to one another , as ascertained from the block 
sets, as will be clear from the examples. 
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A sequence block can occur either once in a sequence,, or 
more than once, and this we determine by examining the block 
sets. If a block occurs ssore than once in a sequence, it will 
always be contained in both its own upstream and downstream 
subsets. On the other hand, if a block occurs only once in a 
sequence, it may or my not he present in its own upstream or 
downstream subset. But, if a block is absent from either its 
upstream subset, or froa its downstream set, that block occurs in 
the strand only once. The relative order of these "unique** 
.blocks can be determined by noting which of them occur in the 
upstream subset, and which of them occur in the downstream 
subset, of the others. Once the unique blocks have been ordered 
relative to each other,, the gaps between them are filled with 
blocks that may be non-unique. However, not every gap can 
necessarily be filled in with a particular block. There is a 
range of locations within which each non-unique block (or 
presumably *~non -unique block) can be present. The range for a 
particular block is determined by noting those blocks that always 
occur upstream of it, and those blocks that always occur down- 
stream of it, A gap can be filled in if, and only if, there is a 
block or a combination of Mocks , whose outer ends have n~l 
nucieotide-long perfect sequence overlaps with the ends of the 
blocks that for® the gap. Because at least two overlaps, each of 
low probability, must occur simultaneously, it is highly unlikely 
that more than one block, or one coiahinatiosi of blocks, can fill 
a gap. If a particular block occurs many times in a strand, it 
will have to be used to fill every gap it matches. This is why, 
using the method of the invention, it is possible to establish 
the sequence of a strand without measuring how many times an 
oligo occurs in the partiais. It is only necessary to determine 
whether an oligo is present or not. 

An important aspect of this invention is the ability to 
sequence a mixture of strands simultaneously. The invention can 
be used for the determination of fragment sequences frois an 
entire fragmented and sorted genome. 

If one strand is being sequenced, all address sets deter- 
mined from a partialing array will contain the same oligos that 
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constitute the strand sat. The only difference is that some 
oligos which are downstream in one set may be upstream in another 
address set. Zi a mixture of strands have been partialed on a 
single partial ing array, certain addresses will be shared by more 
than one parental strand* Their address sets will be composite, 
containing all of the oligos from ail of the strands that the 
address oligo is present in. Addresses that are only found in a 
particular strand in the mixture, however, will have address sets 
which only contain oligos froia that strand. They are identical 
to the strand set, and each contain the same oligos. The Mature 
ca.n contain up to a hundred or so different DMA strands f each of 
a different length and sequence, as can toe obtained with an 
appropriate sorting array (or set of sorting arrays) and 'method 
described above. When a .fixture of strands is analysed on a 
partialing array, the data obtained by surveying the partiais 
will reflect the diversity of the sequences in the mixture, and 
will appear to be very complex. However, we have discovered a 
way to decompose the unindexed address sets obtained, by analysis 
of a strand mijctnre into their constituent strand sets. Then, as 
we have described for sequencing a single strand, the oligos in 
each of the identified strand sets can toe grouped into sequence 
blocks that can be ordered from the information contained in the 
indexed address sets, as will be clear from the exas-pl.es, 

Unlndexed. address sets can be either «prime" or "composite." 
A prime set consists of one strand set; while a composite set 
consists of more than one. A prime set cannot fee decomposed into 
other ad.dress sets, i.e», there is no address set which is a 
subset of a prime set. Composite sets,- however, can usually be 
decomposed into two or store simpler address sets. Once 
individual strand sets have been identified, they can each foe 
treated as though they were obtained from an analysis of a 
homogeneous strand, It is thus possible, in many cases, to 
sequence all strands in an unknown heterogenous DH& sample 
without first isolating the strands. 

The fragment sequences obtained by the methods outlined 
above or by any other method can then be put in their correct 
order using oligo arrays. Assembling restr.ioti.on fragments into 
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contiguous sequences can be accomplished by identifying each 
.fragment's immediate neighbors. One method for obtaining this 
information is to use another restriction ensyise to cleave the 
same QHk at different positions t thus producing a set of frag- 
sients that partially overlap neighboring fragments from the first 
digest, and then to sequence these fragments. However,, it is not 
necessary to sequence the fragments in the. second restriction 
digest. It is only necessary to uniquely identify overlapping 
segments in the fragments from alternate restriction digests. 
This can he done by surveying "signatures" > 

Signatures can be determined by hybridisation of fragment 
strands to complementary oligo probes* A signature of a fragment 
may consist of one, two or more oligos, so long as it is unique 
within the sequence analyzed, neighboring fragments from one 
restriction digest can be determined by looking for their signa- 
tures in overlapping fragments from an alternate digest. 

We have devised a method for identifying neighboring 
restriction fragments among the list of sequenced fragments that 
does not require either cloning or sequencing of overlapping 
fragments. If strands from an alternate digest are sorted, 
complementary strands of the same fragment will hybridise to 
different addresses in the sorting array. Whenever intarsite 
segments from two or more fragments of the first digest are 
present within one fragment of the second digest , then all of 
these segments will be represented in both complementary strands 
of that one fragment, and all will be present wherever those 
strands bind in a sorting array. We identify the segments fey 
obtaining their signatures through hybridisation to specialised" 
binary survey arrays. The signatures of intersite segments that 
occur in one fragment always accompany each other, whereas 
signatures of distant segments travel independently. 

After the fragments from an original (first) restriction 
digest of a long DNA have been sequenced, the same DNA is 
digested with a second (different) restriction endon«clease ? the 
termini of the generated fragments are provided with universal 
priming regions (that also restore the recognition sites at the 
termini) f and the strands are sorted according to particular 
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internal sequences, namely, a variable sequence adjacent; to the 
recognition site for the first restriction ensyme. The sorting 
array is a sectioned binary array, It. contains immobilized 
oligos having a variable sequence as well as an adjacent constant 
sequence that is complementary to the recognition sequences of the 
first restriction endonucleasa. The sorted strands are amplified 
by * s symmetric** PCR^ so that in each well where a strand has been 
bound, copies of the bound strand, as well as complements, are 
generated. In another embodiment f strands can be sorted accord- 
ing to their terminal sequences on an array whose oligos* con- 
stant segments include sequences that are complementary to the 
recognition site of the second restriction enzyme. This alterna- 
tive is not detailed, but it corresponds to the embodiment 
discussed below, but with terminal sorting. 

Each strand that hybridizes to the binary sorting array will 
possess at least two recognition sites for the second restriction 
enzyme (restored at. the strand's tenrdni) , and at least one 
(internal) recognition site tor the first restriction enayme. 
The segments included between these two types of restriction 
sites { inters! te segments) comprise the overlaps between the two 
types of restriction fragments, and each intersite segment is 
thus bounded, fey any two restriction sites of the two types* It 
follows, that each of these segments can be characterised by 
identifying these two restriction sites and variable sequences of 
preselected length within the segment that are immediately 
adjacent to each of the restriction sites * The combination of a 
recognition site (for either the first or the second restriction 
enzyme)' and its adjacent variable oligo we call a "signature 
oligonucleotide'* * Every intersite segment can. be characterised 
foy two signature oligos (of either type) that bound that segment. 
The combination of the two signature oligos is defined herein as 
the intersite segment's "signature" « 

After strand amplification,, the 'strands in the wells of the 
sorting array are surveyed to identify the signature, oligos of 
each of the two types. This is carried out by using two types of 
binary survey arrays. The first has immobilised oligos contain- 
ing a variable oligo segment and a constant segment that is, or 
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includes, an adjacent sequence that is complementary te the 
recognition site for the first restriction endonuclease. The 
immobilised oligos in the second survey array has a variable 
oligo segment of preferably the sas&e length as the variable 
segment of the first specialised survey array , and a constant 
segment that is, or includes an adjacent sequence that is com- 
plementary to the recognition site for the second restriction 
endonuclease . The constant oligo segments in these arrays can foe 
located either upstream or downstream of the variable oligo 
segments, resulting in the surveying of either the downstream or 
the upstream signature oligos in each strand of the inter site 
segments being surveyed. In a preferred embodiment the constant 
oligc 'segments are upstream f and the immobilized ol.igos have free 
3 s er,,.Sj so that they can be extended by incubation with a D&A 
polymerase x Fros the oligo information that is obtained; the 
sequenced fragments can be ordered relative te one another , 

In our method, the uniqueness of a signature is achieved foy 
surveying "half signatures" (signature oligonucleotides) on two 
relatively sssall survey arrays. If the variable segments in the 
arrays are S -nucleotide- long, the number of areas in the two 
arrays is approximately l 3D. ,000, or approximately 100,000,000 
times smaller than the single array that would tee needed for 
detecting the same sise signature (2S nucleotides) . 

If a diploid genome {such as a hutaan genome) is sequenced, 
the ordered fragments will appear as a string of unlinked pairs 
of allelic fragments, What remains unknown is how the allelic 
fragments in each pair are distributed between the homologous 
(sister) chromosoees that came from each parent* Allocation of 
the allelic fragments to these "chromosomal linkage groups** 
requires knowledge of which fragment in each pair is linked to 
which fragment in a neighboring pair. 

We have developed a method that uses arrays for allocating 
allelic fragments to chromosomes, irrespective of what method was 
used for sequencing and ordering the fragments. The linkage of 
fragments in neighboring pairs can be achieved by sequencing a 
restriction fragment ("spanning fragment") from an alternate 
digest that spans at least one allelic difference in each pair. 
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Since the sequences of the allelic fragments are known, there is 
no need to sequence the spanning fragment. Instead, one can 
simply determine which oligos that, harbor allelic differences 
accompany ons another in the spanning fragment , i.e« r which 
olig/os occur in the same c&romoseme* This can be accomplished by 
surveying, at a selected address in a partialing array, parti&ls 
generated from a selected group of restriction fragments from an 
alternate digest, k group of restriction fragments is selected 
that contains a spanning fragment , and an address in a partialing 
array Is selected that encompasses a. difference in one of the 
neighboring allelic pairs. 

Since the sequence of every fragment is known, it- is pos- 
sible to choose an alternate restriction fragment that spans the 
allelic differences in the. neighboring pairs, A spanning re- 
striction fragment, in fact, may already be present at a par- 
ticular address in one of the sorting arrays used to sort alter- 
nate digests during the ordering procedure. 

In this method f sorted strands are melted apart, and the 
mixture is hybridised to a particular well in the partialing 
array t whose address corresponds to one of the allelic oligos, 
T»fo different & v eils are selected,, each, with an address that 
cori'esponds to an oligo that harbors a differenct allelic oiigo~ 
nucleotide After amplification of the partial strands t the oligos 
in the two wells are identified with a survey array* Examination 
tells which fragments are on the same chromosome. 

Since allelic differences occur roughly once every 1,000 
basepairs in the human genome, most allelic fragments resulting 
from digestion with a restriction enzyme recognising a hexameric 
seguence (resulting in about 4,096 average length) will differ 
from each other » If the variable oligo segments in the survey 
arrays are made of ectanucleoti.des, then each allelic nucleotide 
substitution will give rise to eight different oligos in each of 
the allelic fragments. However, using our .method, inspection of 
only one address in the partialing array is sufficient to reveal 
the linkage of the corresponding reference oligo to any one of 
the eight oligos that encompass the nucleotide substitution that 
occurs in. the neighboring fragment on the same chromosome. 
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Therefore, only one address in the partial ing array its needed to 
reveal the lin.1c.ages between two neighboring allelic pairs. Thus,. 
65,536 linkages can be determined on a single comprehensive 
partis ling array made of variable octanucleotidas. With this 
method, only 10 to 2Q of these arrays would be needed to complete 
the assembly of an entire diploid human genome that has been 
fragmented fey a restriction endonucleasa with a hexameric recog- 
nition site. 

Computational methods can bs developed to minimise or 
eliminate errors that occur during partialing and surveying,, by 
taking advantage of the high redundancy in the data. Such 
methods should take into account the following aspects of a 
preferred sequencing procedure; the sequence of every fragment 
is independently determined four titses (by virtue, of each strand 
and its complement feeing present at two different addresses in 
the sorting array) ; each strand set is determined in as many 
trials as the number of different oligos in that strand; every 
nucleotide in a strand is represented by a® many different oligos 
as the length (of the variable segment) of the immobilized oligos 
in the survey array; the locations where a particular block can 
occur in a sequence are limited by the distribution of the blocks 
among the upstream and downstream subsets of each, pertinent 
address; and the edges of a block must be compatible with the 
edges of each gap where that block is inserted, 

. Using our genome sequencing method t one can use throughout 
essentially the same technology, i.e., hybridization of oligo 
probes and the amplification of nucleic acids by the polymerase 
chain reaction, both of which are well-studied , common laboratory 
techniques . The entire procedure can he performed by a specially 
designed machine < resulting in huge reductions in time and cost, 
and a marksd improvement in the reliability of the data* Many 
arrays could be processed simultaneously on such a machine. The 
machine most preferably should be entirely computer-controlled, 
and the computer should constantly analyse intermediate results, 
As stated above f used arrays can he stored, both to serve as a 
permanent record of the results, and to provide additional 
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material for subsequent analysis or for manipulating the 
sequenced strands and partials, 

Analysis of an individual 's genomic DNA provides the com- 
plete nucleotide sequence of that individual's diploid geooae. 
The genes and their control elements are allocated into chromo- 
somal linkage groups as they appear in a single living organise. 
The sequences will describe an intact, functioning ensemble of 
genetic elements, This complete sequencing provides the ability 
to compare genomes of individuals, thereby enabling biologists to 
understand how genes function together and to determine the basis 
of health, and disease. The genomes of any species, whether 
ha.pl.oid or diploid, can be sequenced. 

The invention can he used not only for DNA * s but as well for 
sequencing isisftures of cellular K&As. 

The invention is also useful to determine sequences in a 
clinical setting, such as for diagnosis of genetic conditions 

VI v Manipulating Macleio &eids on Sectioned Arrays 

Our invention also includes using sectioned arrays for 
introducing site-directed mutations into sequenced nucleic acids, 
including the introduction of nucleotide substitutions, deletions 
and insertions* This can be carried out in a s&assively parallel 
fashion* In one embodiment, a partial whose variable end has 
been deprived of a priming region, is ligated to the free ter- 
minus of an immobilised oiigo that contains the mutation to be 
introduced , In another procedure, where the purpose of muta- 
genesis is to introduce a single -nucleotide substitution, then 
the substituting nucleotide can be added directly to the variable 
end of the partial. In both cases, the modified partials or 
their complementary copies are used to synthesize a mutant strand ? 
utilising as a template either the complementary parental strand 
(i.e., from vbich the partials <srere generated) or a longer * 
complementary partial, or any other strand or partial that 
encodes the missinq region* The fixed end of the mutant partial 
is provided with a priming region that is different frcrn the 
correspond ing priming region of the template strand* Therefore, 
only mutant strands are capable of subseguent amplification toy 
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PCE. A single array can foe used either to ssutafce many single 
positions in a «ene f or to introduce mutations in many genes in 
one procedure . 

Sectioned arrays can also be used for the. massively parallel 
testing of the biological effects of the introduced stations . 
For example,, parallel coupled transcription-translation reactions 
can fee carried out in the wells of a sectioned array following 
amplification of the mutant strands. It is thus possible to 
determine simultaneously, on the same sectioned array, the 
effects of many different amino acid substitutions on the struc- 
ture and function of a. protein. 

VII „ Examples 

L Sorting nucleic acids or their fragments on a fexnary 
oligonucleotide array whose immobilised oligos have free 3* 
termini f with constant upstream segments ~~~ 

This method allows the immobilised oligos to serve as 
primers for copying bound strands , resulting in the formation of 
complementary copies covalently linked to the array, 

i.l. Sorting restriction fragments according to their 
terminal sequences, following the introduction of terminal 
priming regions -~ 

DMA is digested using a restriction endonuclease . Recogni- 
tion sites for the restriction endonucle&se are restored in 
solution by introducing terminal extensions (adaptors) that 
contain a sequence which,, together with the restored restriction 
site,- form a universal priming region at the 3 s terminus of every 
strand in the digest. This priming region is later used for 
amplification by P€R. After melting fragments, the strands are 
sorted on a sectioned binary array. A sequence complementary to 
the gsmerated priming region serves as both the constant segment 
of the immobilised oligos and as the primer for PCR amplification 
of the bound strands. 

S>NA to be analysed is first digested substantially com- 
pletely with a chosen restriction endonuc lease, and the fragments 
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obtained ar~e then ligated to synthetic double-stranded oligo 
adaptors, the adaptors have one end that is compatible with the 
fragment termini. The. other end is not compatible with the 
fragments" termini* The adaptors can therefore be ligated to the 
fragments In only one orientation. The adaptors' strands are 
non-phosphorylated, which prevents their self -ligation. The 
strands In the restriction fragments have their S ! termini, phcs- 
phorylated which results from their cleavage by a restriction 
endonuclease > This favors the ligation of the adaptors by a DBA 
ilgase (such as the DMA ligase of T4 bacteriophage) to the 
restriction fragments,, rather then to each other. Since DH& 
Xigase catalyzes the formation of a phosphodi ester bond between 
adjacent 3* hydroxy I and phosphorylatsd 5* termini in a double* 
stranded DHA, the phosphorylated S s termini of the fragments are 
ligated to the adaptor strand whose 3 * end is at the compatible 
side of the adaptor. The 3 5 termini of the fragments remain 
imligatedv A DMA polymerase possessing a S'-S 5 exonuclease 
activity {such as DHA polymerase 1 from Escherichia coli or Tag 
DH&. polymerase from. Thermus aquatxcus) is then used to extend the 
3 s ends of the fragments,, utilizing the ligated' oligo as a 
template, concomitant with displacement of the un ligated oligo. 
To make tbe ligated oligo resistant to the S*»3 S exonuclease, the 
ligated oligo can be synthesized from &~phosphorot.hioate precur- 
sors . 

Although the oligo adaptors are provided In great excess 
during the ligation step, there is still a low probability that 
two restriction fragments will ligate to one another, rather then 
to the adaptor. To prevent this, the ligation products can again 
be treated with the restriction endonuclease used to gener-ate the 
fragments^ in order to cleave the formed inter fragment diisers. 
The endonuclease will not cleave the ligated adaptors if they are 
synthesized from modified precursors (such as nucleotides con- 
taining H*~F;ethyl~deoxyadenosine) , which are known and currently 
eomRsercialiy ava.lla.bie [e.g., from Pharmacia irKBj . Resistance of 
the ligated adaptors to digestion fey the restriction endonuclease 
can be increased further if the ligated oligo is syntheslsed froia 
phosphorothioates, and if phosphorethioate analogs of the nudeo™ 



WO 93/17126 



PCT/US93/8I552 



-33- 

side triphosphates are used as substrates for extension of the 3* 
termini . 

After the priming regions .have beers added f the complementary 
strands are melted apart, such as by increasing temperature 
and/ or by introducing denaturing agents such as guanidine iso- 
thiocyanate, urea, or formamide* The resulting strands are 
hybridised to a binary sorting array , such as by following, a 
standard protocol for the hybridization of DH& to immobilised 
oligos. Hybridisation is performed so that formation of only 
perfectly matched hybrids is promoted. The .hybrids have a length 
which is equal to that of the immobilized oligos. The immobi- 
lised oligos are attached to the array at their 5' termini and 
contain constant restriction site segments adjacent to a variable 
segment of predetermined length. Each strand will he hound to 
the array at its 3* terminus. Its location within the array will 
be determined by the identity of the oligo segment that is 
located in the strand immediately upstream from the restored 
restriction site at its 3 ! end, and that is complementary to the 
variable segment of the immobilised oligo to which it. is bound* 
After hybridisation and washing away all unbound, material , the' 
entire array is incubated with a DSA polymerase t such as Tag DKA 
polymerase deoscyrifoonueleotide 5* triphosphates or the DMA 
polymerase of bacteriophage T7 f and substrates * As a result,, the 
3* end of each immobilised oligo to which a strand is bound will 
be extended to produce a complementary copy of the bound strand. 
The array is vigorously washed. The wells are then filled with a 
solution containing universal primer, an appropriate £>NA polymer- 
ase f and the substrates and buffer needed to carry out FCR. The 
array is then sealed f isolating the wells from each other, and 
exponential amplification is carried out, preferably simul- 
taneously; in each well. 

1*2* Sorting restriction fragments according to their 
terminal sequences, with 3* and 5* terminal priming regions being 
introduced, one before and one aftex~ strand sorting - — 

This procedure consumes larger amounts of ensymes and 
substrates than the procedure described in Example l.l, however, 
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only -those strands that are correctly hound to the immobilized 
oiigos acquire both priming regions necessary for PCE. The 
possibility that non-specif ically bound strands will be amplified ? 
is minimised. Furthermore, different priming regions can be 
introduced, at different termini of a strand. It then becomes 
possible to: (I) perform "asymmetric" PCR, where only one of the 
complementary strands is accumulated in significant amounts, and 
remains s ingle -stranded s (2) introduce a transcriptional promoter 
into only one of the priming regions, in order to be able to 
obtain. RNA transcripts of only one strand (without, also producing 
its complement; (3) differentially label complementary strands; 
and (4) avoid self-anneaiing of the strand's terminal segments 
that can interfere with primer hybridisation and lower PCK 
efficiency. 

In this examples, digestion of DN&, adaptor ligation and re- 
digestion of fragments are carried out as described in &jca:mple 
1.1, above. The 3* ends of the restriction, fragments, however, 
are not extended by incubation with D&& polymerase* .Instead f the 
strands ligated at their S* ends to adaptors are melted apart 
from their une:<t ended complements and hybridised to a binary 
array. The array contains immobilized oiigos that are pre- 
hybridised with shorter complementary 5 * ~phosphorylated oiigos 
that cover (mask) the immobilized oiigos except for a. segment 
which includes a variable region and a region complementary to 
the portion of the restriction site reaaining at the fragments* 
(unrestored) 3* end. The masked region includes the rest of the 
restriction site and any other constant sequence, such as may toe 
included in a priming region* Hybridization is carried out under 
conditions that promote the formation of only perfectly matched 
hybrids which are the length of the unmasked segment of the 
immobilized oligo. After washing away the unbound strands, the 
strands that remain bound are ligated to the masking oiigos by * 
incubation with X>NA ligase* The correctly bound strands thus 
acquire a priming region at their 3' end, in addition to the 
priming region they already have at their 5 s end. The two 
pricing regions preferably correspond to different primer's. The 
array is then washed under appropriately stringent conditions to 
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remove all nucleic acids except the immobilized oil gas and the 
ligated strands hybridised to them> 

i.3» Sorting RH&s according to their terminal sequences ~~ 
Mature eukaryotic mRN&s share structural features that can 
help in their manipulation -using arrays* All have a ^cap*' 
structure on their 5* end,, and aaost also possess a 3 ' -•terminal 
poly (A) tail, which is attached posttranseriptionally by a 
poly (A) polymerase. Because there are usually no long oligo (A) 
tracts in the internal regions of cellular HH&s, the poly (A) tail 
can serve as a naturally occurring terminal priming sequence In 
sorting* The size., of M<MAs (several thousand nucleotides in 
length) allows thesi to be amplified and analysed directly, 
without prior cleavage into fragments. 

There are known methods for preparing essentially undegradad 
total cellular SNA, Total cellular SNA is converted into com- 
plementary BHA (cDNA) using an oHgo(dT) primer and a reverse 
transcriptase or Thermits thermophilics DHA polymerase. Then, 
omitting second strand synthesis, single-stranded cDHAs (which 
possess oligo (dT) extensions at their S ! end and variable 3* 
termini) are sorted according to their 3 » -termini on a sectioned 
binary array and are ligated there to pre-hybridiaed adaptors of 
a predetermined sequence that are complementary to the immobi- 
lised oligos* constant sequences and that introduce into a cWJk 
.molecule the 3 i -terminal priming site. The cDNA is amplified; 
using two primers for PCR; oligo(dT) and an oligo complementary 
to the adaptor x 

2. Preparing partial strands of nucleic acids on olige- 
mic loot ide arrays — 

There, are two aspects to this procedure; first, the genera- 
tion of partial strands (partials) , and second, the sorting of 
partial® according to their terminal oligo segments. All of the 
embodiments described below are based on the following principle; 
in generating partials from a strand, one of the original strand 
ends is preserved (it will be referred to as the "fixed « end) f 
whereas the other end is truncated to a different extent in the 
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various partiais (it will be referred to as the "variable" end) , 
Although either the 5* or the 3* end of the original strand can 
serve as the fixed end, it is preferable that the 5 s end be 
fixed. If amplification of sorted partiais is desirable, it is 
preferable that the S* end of the original strand, .Ue., the 
fixed end, be provided with a priming region prior to part ia ling 
by any of the methods described above, and that part ia ling be 
carried out. on a sectioned array* Either an individual strand or 
a mixture of strands can be subjected to a partial ing; howsvsr, 
if the fixture is very complex (such as a restriction digest of a 
large genome) it is desirable that the mixtnre first be sorted 
into less complex groups of strands f as described above . The 
groups of strands used for preparing partiais should essentially 
be devoid of contaminating strands? therefore, sorting by ter- 
minal sequences is preferable for the preliminary sorting. If 
preliminary sorting is performed , the strands will already 
contain terminal priming regions necessary for amplification of 
the partiais. Partialing can be performed on either DHA or mh, 
the final product being either DNA or mk f in either a double- 
stranded or a single»stranded state. 

2.1. Methods employing enzysiatic cleavage of DHA frag- 
ments — 

The purpose of the cleavage is to produce a set of partiais 
of every possible length? therefore f BWA should be cleaved as 
randomly as possible, and to the extent that there is approxi- 
mately one cut per strand. Deoxyribonue lease 1 {DHase 1) cleaves 
both double-stranded and single-stranded DKA; however, double- 
stranded DsMA is preferable as the starting material , for preparing 
partiais because ol its essentially homogeneous secondary struc- 
ture, so that every segment of a 0N& molecule is egualiy acces- 
sible to cleavage* Double-stranded DMA fragments are produced as 
a result of H symmetric* 8 PCE that can be carried out when sorting 
strands. An advantage of using DNase 1 is that it produces 
fragments with 5 ! -phosphoryi and 3' -hydroxy 1 termini f that are 
suitable for en^yssatic ligation. 
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After cleavage of the double-stranded DMA fragments, OHase 
is removed , «,g, f by phenol extraction, The (partial) strands 
are then raited apart and are hybridised to a sectioned binary 
array , wherein the immobilised oligos are pre~hybridised with 
shorter complementary 5 ' -phosphorylated oligos of a constant 
sequence that cover (mask) the immobilised oligos except for a 
segment that consists of a variable sequence. Hybridization is 
carried out under conditions that favor the formation of per- 
fectly matched hybrids of a length that is equal to the length of 
the .unmasked (variable) segment of the xiassohiliaed oligo, and 
that minimise the formation of imperfectly matched hybrids. 
After washing away tsnbound strands., the bound strands are ligated 
to the masking oligos by incubation with a DM ligase„ The 
l.i gated masking oligos will themselves serve as the second 
(3 , ™ terminal) priming region of a partial strand. (£.11 the 
partial® of a strand will share the same 5 5 priming sequence that 
had been introduced into the strand before generation of the 
partials) , If restriction fragments are to be partiaied that 
possess seme restriction site at their termini and do not. possess 
this site internally, it is preferable that the 3 s terminal 
priming region added to the partials include that site. This 
increases the specificity of terminal priming during subsequent 
amplification of the partials by PCR. Suhsegnent extension, 
washing, and amplification steps are as described in Example 1,1, 
If the partials are prepared for the purpose of sequence deter- 
mination f asymmetric PCR can be performed. Alternatively, ax* :KH& 
polymerase promoter sequence can be included in one of the two 
primers, and amplified Dh r A is then transcribed to produce multi- 
ple single-stranded copies of one of the two complementary 
part ia 1 strands « 

2.2, Methods employing chemical degradation of SNA - — 
These methods are applicable to both double-stranded and 
si.ngle~stranded nucleic acids. Chemical degradation is, in most 
cases, essentially random. It can be performed under conditions 
that destroy secondary structure, and the small sise of the 
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modifying chess teals makes the chemicals readily accessible to 
nucleotides in secondary structures, 

Both base~nonspeci.f tc reagents and base-specif ic reagents 
can be used. In the latter case, after base-specific cleavage is 
performed separately with several portions of the sample,, the 
portions are mixed together to form a set of all possible partial 
DHA lengths. Th& main drawback to chemical cleavage ts that the 
location of the terminal phosphate groups on the fragments is 
opposite to what is required for enay&atic ligations 5 * -hydroxy 1 
and 3 * -phosphoryl groups are produced in most esses. To overcome 
this problem,, enzymatic ^phosphorylation of 3 s ends can be 
carried out, 

2,3. Method of preparing partials directly on a sectioned 
array, without prior degradation of nucleic acids -~ 

In this embodiment, the generation of partials and their 
sorting according to the identity of the sequences at their 
variable ends occur essentially in one step. First t a strand or 
a group of strands {if double*~strandad nucleic act?:! is used as a 
starting ssaterial, the complementary strands are first melted 
apart) , is directly hybridized to a sectioned ordinary array , 
whose oiigos only comprise variable sequences of a pre-selected 
length,, and that are immobilised by their S" termini* Optimally, 
hybridisation is carried out under conditions in which hybrids 
can only form whose length is equal to the length of the immobi- 
lised oligo. If the array is comprehensive, then a hybrid is 
formed somewhere within the array for every oligo tbat occurs in 
a OKA's sequence. After hybridisation, the entire array is 
washed and incubated with an appropriate DNA polymerase in order 
to extend the xiaBobiliKed oligo, using the hybridised strand as a ? 
template. Each product strand is a partial {complementary} copy 
of the hybridised strand. Each partial begins at the place in 
the strand's sequence where it has been bound to the immobilised 
oligo and ands at the priming region at the 5 f terminus of the 
strand. If a priming region has not been introduced at the 
strand's 5* end before partial ing, it can be generated at this 
step, after the hybrids that bave not been, extended, are elimi- 
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nated by washing. This can be done either by ligating the 5* end 
of the bound strand to a single- stranded oligoribonucleotida 
adaptor, or by tailing the immobilized partial copy with a 
homcpolynucleotida. The entire array is vigorously washed under 
conditions that remove the original full-length strands and 
essentially all other material not eovaientiy hound, Subsequent 
amplification of the imsiobilised partials can he carried out in 
different ways, dependent on whether it is desired to use linear 
or exponential amplification. 

Exponential copying results in the generation of partials 
and their complements. For a strand to be exponentially ampli- 
fied by FCR, .both of its termini should .be provided with a 
prising region, preferably different priming regions* The 
immobilised (complementary) partial contains only one {3 5 ~- 
terminal) priming region, and a complementary copy produced by 
linear copying would also have only one priming region (on its 5* 
end) . For R*?A copies to have a priming region at their 5 ' ends^, 
the immobilised partial should have been provided with an 
polymerase promoter downstream of its 3 * terminal pricing region 
using the methods described herein. The second priming region 
that ia needed for exponential amplification can be introduced at 
the 3 f ends of the complementary copies as follows, 

{a) The 3* termini of RNA copies can then foe ligated to 
oligoribonucleotide or oligodeoxj ? ribonuoleot.ide adaptors which 
are phosphorylated at their 5 s end and whose 3' end is blocked* 
Exponential PCR can be performed by utilising the two primers 
that correspond to the two priming regions, and then incubating 
■with. Tth DMA polymerase. 

(b) If the amplified copies are DJiA, they can be trans- 
ferred, such as by blotting, (after melting them free of the 
immobilized partial) onto a binary array that is a mirror copy of 
the first array in the arrangement of the variable segments of 
its imiaobilised oligos, The const&nf segments o.f this binary 
array are pre-hybridised to masking oligos whose ligation to the 
3* termini of the transferred DBAs (by DNA iigase) results in 
generation of the second priming region to permit exponential 
PCR. 
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In methods (a) and (b) , both priming regions preferably 
contain, when applicable, the recognition sequence of the 
restriction endonuc lease that was used to digest the genomic DNA s 
before full-length strand sorting, and which had thus been 
substantially eliminated from the strands' internal regions. 

(c) If partial© are surveyed only for oligos that occur in 
one complementary strand (such as detecting only parental 
oligos} f either only one of the two different primers should be 
labeled, or the primers should be labeled differently- it is 
also possible to use labeled substrates during asymmetric PGR* 

3. Surveying oligonucleotides with binary arrays — 
Surveying oligo content can be carried out in the different 
embodiments of the invention by hybridisation ox strands (or 
partials) to an ordinary array,, followed by detection of those 
hybridised. However, the signal-to-noise ratio is not high 
enough to always avoid ambiguous results. The most significant 
problem is inability to sufficiently discriminate against mis- 
matched basepairs that occur at the ends of hybrids. That 
hampers analysis of complex sequences. The use of binary arrays 
helps to overcome this problem. 

Binary arrays are also usefxsl xor surveying longer oligos 
than are easily surveyed on an ordinary array (s.g, f signature 
oligos) without increasing the size over that of an ordinary 
array « 

Immobilised oligos in a binary survey array can have either 
free 5* or 3' ends, and the constant segment can be either 
upstream or downstream. In most cases, it is preferable that the 
3' ends of im.mobili.2:ad oligos be free, and that the ir constant 
segments be upstream* ? 

Surveying can utilise sectioned arrays. However, the use of 
plain arrays is preferable because they are less expensive and * 
more amenable to miniaturisation. The following methods are 
based on the use of plain binary arrays and involve fragmentation 
of the strands or partials prior to surveying. 
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3x1, Comprehensive surveys of DNA strands — 

gvery oligo present in a strand or in a partial f or in a 
group of strands or partials, is surveyed. If a survey of 
parti ale is performed in order to establish nucleotide sequences, 
it is preferable that each partial be represented by the same 
sense copies. Thus, there should be only one of the complemen- 
tary strands in a sample or the complementary strands should he 
di£ farentiable, e,g<, one strand should produce either no de- 
tectable signal or a weaker signal. This can be accomplished by 
amplifying the partial® ■ linearly or toy the use of asymmetric PCS* 

DN& strands (or partials) to be surveyed are preferably 
digested with nuclease SI under conditions that destabilise 0N& 
secondary structure. The digestion conditions are chosen so that 
the DNA pieces produced are as short as possible, hot at the same 
time, most are at least one nucleotide longer than the variable 
segment of the oligos immobilized on the binary array. If the 
surveyed strands or partials have .been previously sorted and 
amplified on a sectioned array, this degradation procedure can he 
performed simultaneous ly in each well of that array, Alterna- 
tively t if it is; desired to store that array as a master for 
later use, the array can he replicated by blotting onto another 
sectioned array. The DK& is then amplified vithin the replica ' 
array by {asymmetric} PC'S, prior to digestion with nuclease Sl» 

After digestion, the nuclease is inactivated hy, for ex- 
ample f heating to iOO*C, and the DMA pieces are hybridised to an 
array whose immobilised oligos 11 constant segments are pre- 
hybridixed to 5 f -phosphorylated complementary masking oligos. 
Preferably, the constant segment contains a restriction site that 
has been eliminated from the internal regions of the strands 
prior to sorting and is long enough so that its hybrid with the 
mashing oligo is preserved during subseguent procedures. 

The array is incubated with DHA iigase to ligate the masking 
oligos to only those hybridised DNA strands (or partials) whose 
3* terminal nucleotide is immediately adjacent to the 5 f end of 
the masking o.ligo,, and matches its counterpart in the immobilized 
oligo, DNA ligase is especially sensitive to mismatches at the 
function site* 
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&tt®T all non-1 iga ted IMA pieces .have been washed away under 
much more, stringent conditions that 'were used during hybridiza- 
tion, the immobilised oligos are extended by incubation with a 
DMA polymerase, preferably by only one nucleotide, using the 
protruding part of the lighted DNA piece as a template , and 
preferably using the chain- terminating 2 1 f 3 Wlideoxy nucleotides 
as substrates. Extension is only possible, if the 3' -terminal 
base of the immobilisssa oligo forms a perfect basepair with its 
counterpart in the hybridized DHA piece. The use of the dideoxy- 
nucleotides ensures that all hybrids are extended by exactly one 
nucleotide and that all are of the same length. The array is 
then washed under conditions sufficiently stringent to remove 
unattended hybrids . 

3,2* Detection of hybrids — 

Hybrids can be detected by a number of different means, 
Unlabeled hybrids can be detected by using surface piasmon 
resonance techniques, which currently can detect 10* to ID* 
hybrid solecnies per square .millimeter* Alternatively,- hybrids 
can be conventionally labeled, such as with radioactive or 
fluorescent groups* Fluorescent labels are convenient. 

To ensure the lowest level of background labeling, it is 
preferable to label hybrids in a manner such that its detection 
is dependent en the success of both a ligation and an extension 
step. This can be accomplished within the scheme of oligo 
surveying by labeling the masking oligos, and the 2 * , 3 % ~dideo:<y~ 
nucleotides used for the extension with fluorescent dyes possess- 
ing different amission spectra, The array can then be scanned at 
different wavelengths, corresponding to the emission maxima of 
the two dyes, and only signals from those areas that e&it fluo- 
rescence of both coloi's are taken as a positive result. 

After hybrids are extended { concomitant with labeling) and 
edited, the array is thoroughly washed to remove unincorporated 
label, destroy unextended hybrids, and discriminate one more time 
against mismatched hybrids that might have remained * A preferred 
method is to wash the array at steadily increasing tempera tuxe, 
with, the signal from each area being read at a pre -determined 
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time, when the conditions ensure the highest selectivity for the 
particular hybrid that forms in that area, other conditions 
(such as denaturant and/or salt concentration) can also foe 
controlled over time* The fluorescence pattern can fee recorded 
at predetermined rise intervals with a scanning microf luorometar ( 
such as an epi fluorescence microscope. 

4. Determination of the nucleotide sequences* of strands in 
a fixture when each strand possesses at least one oligo that does 
not occur in any other strand in the mixture — 

Figures 8 to 11 depict the determination of the sequences of 
two mixed strands -using the methods of the invention,. The 
example demonstrates the power of the invention to identify all 
the oligos present in a strand (i.e., its strand set) when it 
possesses at least one oligo that does not occur in any other 
strand in the fixture. In particular, the example demonstrates: 
(a) how the data obtained by surveying the partial, strands 
generated from a mixture of strands and sorted foy their variable 
termini (i.e., the upstream subset of each address) and the 
inferred downstream subset of each address (which together form 
the indexed address sets) are used to construct the unindexed 
address sets? and (fo) how the unindexed addreas sets are compared 
to each other to identify prime sets, The example also demon- 
strates how the oligos contained in a strand set are assembled 
into the sequence of the strand, even though the primary data is 
obtained from a mixture. In particular, the example deiaon~ 
stratesi (a) how oligos in a strand set are assembled into 
sequence blocks? {b} how the contents of the indexed address sets 
are filtered so that only information pertaining to the oligos in 
a particular strand set remains; (c) how this filtered data is 
re-expressed in terms of the sequence .blocks that are contained 
in that particular strand; <d) how information in the resulting 
** block sets" is used to identify those blocks that definitely 
occur only once in the strand {"unique blocks 5 *} and to identify 
those that can potentially occur more than once; (e) how informa- 
tion in block sets of unique blocks is used to determine the 
relative order of the blocks that occur only once in the strand; 
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(f } .how the information in the block sets limits the positions at 
which the other blocks can occur (relative to other blocks) ; and 
(g) how a consideration of the sequences at the ends of blocks, 
in combination with a consideration of the relative positions of 
the blocks, leads to the unambiguous determination of the com- 
plete sequence of the strand, This example also illustrates; 
(a) how oligos that occur more than once in a strand are identi- 
fied and located within the sequence, even though the survey data 
contain no information as to the number of times a particular 
oligo occurs in a partial or a mixture of partials having the 
same terminal oligo? and (h) how the sequences of different 
strands in a mixture can be determined separately., despite the 
fact that many of the oligos occur in more than one strand. 

Figure 8a. shows the sequences of two short strands (parental 
strands} that are assumed to be present in a mixture (with, no 
other strands) , It is assumed that complete sets of partials 
have been generated from this mixture, and that each sat of 
partials has been separately surveyed, with the partials sharing 
the sa-sse address oligo being surveyed together. For the purpose 
of illustrating the method of analysing the data, it is assumed 
that the address oligos ana the surveyed oligos are three nucleo- 
tides in length* In practice, longer oligos should be used. 
However* for illustration it is easier to comprehend an example 
based on trinucleotides. The sa^e methods of analyzing the data 
apply when longer oligos are surveyed, when much longer strands 
are i.n the mixture, and when the mixture contains many more 
strands „ 

Figure 8b shows the upstream subsets determined by surveying 
and the downstream subsets inferred (i.e.. Figure 8b shows 
indexed, address sets) , The address oligos (bold letters) are 
listed vertically ir* the center of the diagram. The oligos 
listed horizontally to the left of each address oligo are those 
oligos that were detected in a survey of the partials at that 
address (the upstream subset) , The oligos listed horizontally to 
the right of each address oligo are those inferred front the 
upstream subsets to occur downstream of that address oligo (the 
downstream subset) , For example f oligo M &CC" is contained in the 
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upstream subset of the address oligo "CCT** , This .means that 
oligo *C€T* occurs downstream of. oligo "ACC* S in at least one 
strand in the mixture. Therefore »CCT* is inferred to be in the 
downstream subset of address sot "&CC". The regaining downstream 
oligos in all of the address sets are similarly inferred. Note 
that an address oligo is a member of its own upstream and down- 
stream subsets x 

After the indexed address sets of all addresses in the 
parental strands have been determined (as shown in Figure sb) , 
the information is organized into unindebted address sets { Figure 
Sc) , having no division into downstream and upstream subsets, bat 
merely listing, for each address oligo, those oligos that occur 
in either the upstream or downstream subset (or in both) * In 
Fig-are Sc, the address oligos (bold letters) are listed verti- 
cally on the left side of the diagram* Note that the address 
oligo is a member of its own unindebted address set. 

Unindebted address sets are grouped together according to the 
identity of the oligos they contain (Figure 8d) . Unindexed 
address sets that contain an identical set of oligos are grouped 
together * It can be seen that three groups of address sets are 
formed in this example. The groups are identified by the' Soman 
numerals (I f 11, and 111), The address oligos of each group (for 
example, CTA , CTC, and TCC in group II) always occur together in 
a strand and can occur together in more than one strand. 

Each group of identical address sets is then compared to all 
other groups of identical address sets to see if its common 
address set appears to be a prime by seeing whether any other 
address set is a subset of it* For example, in Figure Sd, the 
address set common to group 111 is not a prime address set, 
because the address set. cosaaon to group 1 is a subset of the 
address set common to group III. However, the address set common 
to group X and the address set cordon to group II appear to be 
prime address sets, 

.Each putative prime address set is then tested to see if it 
is a strand set by examining all the address sets that contain 
all of the oligos that are present in it. For example, in Figure 
9a, all the address sets that contain all the oligos present in 
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the putative prise address set common to group 1 are listed 
together (namely the address sets contained in groups I and I IX) * 
The a&Sress oligos are shown in bold letters on the left side of ^ 
the diagram, and the groups are identified by Roman numerals. 
The address set cosaaon to group 1 is indeed a p.ri&e address set 
(and therefore it contains a single strand set) because a list of 
the eleven oligos that are found in every address set in the 
diagram, (they are seen as full coluams) is identical to the list 
of eleven addresses on the left side of the dissgra.au Similarly,, 
.Figure Sfo shows why the address set coiroaon to group XI is also a 
prisse set. The twelve oligos common to every address set in the 
diagram are all found in the list of twelve addresses on the left 
side of the diagram. Had either of these putative prims address 
sets not turned out to he a prree set (by the criterion described 
above) , then it should have been identified as a pseudo~prime 
address set, and further analysis would have been required to 
decompose it into its constituent strand sets, 

Once the strand sets in a mixture have been identified, the 
oligos in each strand set can be assemhled into the strand 
sequence in a series of steps, as illustrated in Figure ID (which 
utilises the strand set determined in Figure 9a) , 

First the oligos in the strand set are assembled into 
sequence blocks- A sequence block contains one or more uniquely 
overlapping oligos. Two oligos of length n s uniquely overlap 
eaoh other if they share an identical sub~-seguenee that is a-i 
nucleotides long and no other oligos in the same strand set share 
that sub- sequence. For example r for the strand set shown in 
Figure 10a f the oligos "CAT" and n h'SQ ii share the sub-sequence 
"AT" which does not occur in other oligos- These two oligos 
therefore uniquely overlap to form the sequence block ,f CATG*% as 
shown in Figure xofo* Similarly, oligo W TGG** uniquely overlaps 
oligo W GCT W by the common sub-sequence »G£», and oligo "GOT" also * 
uniquely overlaps (on its other end) oligo «GTA* S by the cession 
suts-sequence »GT*. Thus, the three oligos ( W T6©», "©GT", and. 
"GTA") can be maximally overlapped to form sequence block 
*T£GTA W . In forming sequence blocks, the following rule is 
adhered to: two oligos can be included in the same block if they 



are the only oligos in the. strand set- to possess their common 
sub-sequence. Thus, ,4 ATG« does not uniquely overlap «T£G", 
because the strand est cent-aims a third oligo, * S TTG* , that shares 
the. common sub-sequence W T0 B . lf. t following these rules, an 
oligo does not uniquely overlap any other oligo, then a sequence 
block consists of only that oligo* For example, «TAA« forms its 
own block. Following the above rules, the eleven oligos that 
occur in strand set A can be assembled into four sequence blocks » 
Second, the data contained in the indexed address sets shown 
in Figure Sb are filtered to remove extraneous information that 
does not pertain to strand set Figure 10c shows the resulting 

filtered address sets, All address sets whoso address oligo is 
not one of the oligos in strand sot A are eliminated . In addi- 
tion,- all oligos that are not meish^rs of strand set A are removed 
from the upstream and downstream subsets of the regaining address 
sets* The resulting filtered address sets are then grouped 
together 1 according to the oligos that are contained in each 
block. For example , the filtered address sets for address oligos. 
"CAT** and M ATi3 M have .been grouped together in Figure 10c because 
these two oligos are contained in sequence block n CATS' 5 . In 
Figure 10c, the address oligos found in the same block are 
identified toy rectangular bo&es. In addition; oligos that occur 
in the same block are grouped together within each upstream and 
downstream subset, 

. Third, the filtered address sets are converted into block 
sets, as sho'«m in Figure lOd. In a block set, the information 
frosa different address sets is combined. Instead of a different- 
horizontal line for each filtered address set that pertains to a 
particular block, the information in all of the address sets that 
pertain to that particular block is combined into a single 
horizontal line, For example, in Figure 3c, five different 
filtered address sets pertain to seguence block "TACCTTG" . In 
Figure lOd, these five lines are combined into a single line in 
which the address oligos are x®piac&& by an ^address block* 5 , 
shown as $i TA€CTTG" ear rounded by a bold box. Similarly, the 
upstream oligos are replaced by upstream blocks, and the down- 
stream oligos are replaced by downstream blocks. In substituting 
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sequence blocks for the upstream {or downstream) oligos that are 
contained in the filtered address sets for a given address block, 
the following rule is adhered to; a sequence folocxfc only occurs 
in the upstream subset (or in the downstream subset) of art 
address block., if every oligo that is contained in that address 
block occurs in the upstream (or in the downstream) subset of 
every filtered address set that pertains to that address block. 
For example, sequence block "CATC" occurs in the tspstream subset 
of address block "T&CCTTG" because oligos "CAT 5 * and "ATG W occur 
in the upstream subset of address oligos "TAC" *ACC",, !! cci >s! , r 
*CTT**, and «TTG» . 

Often, a sequence block doss not occur- in its own upstream 
or downstream subset. For example,, sequence block "C&TG" does 
not occur in the upstream or downstream subset of its own block 
set (i.e., in block set *CATG W ) , because oligo w ATG* f is not 
present in the upstream subset of address set "CAT" and oligo 
"CAT" is not present in the downstream subset of address set 
"ATS", When a sequence block: does not occur in its own upstream 
or downstream subset, this indicates that that sequence block 
occurs only once in the nucleotide sequence of that strand. 
However , a sequence block may occur in both the upstream subset 
and in the downstream subset, of its own block set. For example t 
seguenoe block *TGSTA W occurs in both the upstream subset and in 
the downstream subset of block set "TGGTA", When a sequence 
block does occur in its own upstream and downstream subsets, it 
indicates that the sequence block aay, but not oust, occur more 
than once in the sequence. The presence of more than one paren- 
tal strand in the original Mixture can introduce additional 
oligcs i.nto the filtered upstream and downstream subsets that can 
cause a. block that actually occurs- only once in a sequence to 
appear in both the upstream and downstream subsets of its own 
block set. However, further analysis of the data determines the 
multiplicity of each block in the strand, (as described below) , 
thus resolving these uncertainties, For convenience, block sets 
that pertain to blocks that definitely occur only once in the 
sequence are listed together. For example, in Figure 10d, block 
set *CATG« and block set "TftCCTTG" are listed together. 
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Fourth f the posit ion of each sequence block relative to the 
other ssq«ence blocks is determined* An examination of the block 
sots that pertain to unique blocks (that definitely occur only 
once in the sequence of the strand) indicates their relative 
positions. For example, in Figure XOd, block set ,f CATG if indi- 
cates that unique sequence block **TACCTTG M occurs downstream of 
unique sequence block "CATG". This is confirmed by block set 
"TACCTTG**, in which unique sequence block "CATG" occurs upstream 
of unique sequence block "TACCTTG" . The relative position of the 
two unique sequence blocks is indicated in Figure XOa, where the 
top line to the left of the arrow shows *' CATS'* upstream (to the 
left) of »TACCTTC W . The relative position of the sequence blocks 
that can potentially occur more than once in the nucleotide 
sequence of the strand is determined from their presence or 
absence in the upstream and downstream subsets of other sequence 
blocks* For example, sequence block tf T.&&** occurs in the down- 
stream subset of block set "CATG" (and does not occur in the 
upstrawra subset of block set *CATG«) . Furthermore, sequence 
block W TAA M also occurs in the downstream subset of block sat 
st TACCTT£ !!f {and not in its upstrea» subset) „ Therefore, sequence 
block «TAA W must occur dewnstraam of both unique sequence blocks 
*CATG* and *TACCTTG«. This is indicated in Figure iq«, where the 
bottom line to the left of the arrow shows *'TAA H as occurring 
downstream of "CAT<S W and n'ACCTTG* . Furthermore, sequence block 
"TGSTA" occurs only in the downstream subset of block set w c&TG" , 
Therefore, it oust occur downstraasa of **€M?G* S in the sequence , 
On the other hand,, sequence block "TGGTA" occurs in both the 
upstream and downstream subsets of block set W TACCTTS W . This 
indicates that *TG$T& n can potentially occur in the sequence at. 
positions both upstream and downstream of unique sequence block 
M TACCTTG W . Finally, **TS3GTA Sf only occurs upstream of "TAA", This 
is indicated in Figure IGe, where the bottom line to the left of 
the arrow contains a bracket that shows the range of positions at 
which ?> TGGTA Sf can occur, relative to the positions of the other 
sequence blocks- At this point in the analysis, the diagram to 
the left of the arrow in Figure 9c contains all the information 
obtained that pertains to strand set h. 
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Finally, the sequence of the strand is ascertained by taking 
into account both the relative position of the sequence blocks,, 
as shown in the diag.ra.rn to the left of the arrow in Figure lOe, 
and the identity of the sequences at the ends of the sequence 
blocks « The object of this last step is to assemble the blocks 
into the final sequence. Four rules are followed; (a) each of 
the blocks must be used at least ors.ce; (b) the blocks sanst be 
assembled into a single sequence; (c) the ends of blocks that are 
to be joined must maximally overlap each other (i»e,, if the 
surveyed oligos are n nncleotides in length, then two blocks 
maximally overlap each other If they share a terminal sub- 
sequence, that is n~I nucleotides in length) ? and (d) the order of 
the blocks must be consistent with their positions relative to 
one another, as ascertained from the block sets. For example, in 
Figure IGe, "CATG* is upstream of "TACOFSO". "CATS** cannot foe 
joined directly to w t£&CCTTG*% since these two sequence blocks do 
not possess maximally overlapping terminal sequences (two nucleo- 
tides in length) . However, an examination of the permissible 
positions at which other sequence blocks can occur indicates that 

can occur in the gap between "CATS" and *TACCTTG W . The 
ends of these sequence blocks are then examined to see whether 
the gap can be bridged. *CATG W can be joined to "TGOTA* foy 
maximally overlapping- their shared terminal sub-sequence W K", 
Furthermore ,S T£ST.V ! can foe joined to H TACCTTG n by aaximally 
overlapping their shared terminal sub-sequence *TA». Similarly, 
the gap that occurs downstream of "TACCTTG" can potentially be 
filled, by both »TAA n and ,f TGGTA*' » *"TAA n must be used, because it 
was not used at any other location.. However, W TACCTTG W cannot be 
directly joined to »TAA». The solution is to join "T&CCTTG" to 
"TOGTA", and then to join «TGGTA" to «TA&«. Thus, the sequence 
of strand A (which is shown in Figure lOf ) is unambiguously 
assembled by utilizing sequence block 5S TG<3TA*> twice (as sum- 
marised in the diagram to the right of the arrow in Figure lOe) . 

The same procedure is followed to determine the sequence of 
strand B (see Figure 11} . in this example , there are three, 
sequence blocks that do not occur in their own upstreaxs or 
downstream subsets, and they therefore definitely occur only once 
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in the sequence of strand E { namely , sequence Mocks n CTTG n , 
"G-TCC" , and "TACC") . An examination of block set M GTCC" shows 
that "GTCC" occurs upstream of ,! OTTG ,!S and »TACC W . However, an 
examination of block set *'CTTG* S and an examination of block set 
«TACC fS indicates that sequence blocks W CTT<3" and W TACC W can both 
occur upstream and downstream of each other ? which appears to 
conflict with the observati.cn that these sequence blocks only 
occur once in the sequence of strand &„ There is actually no 
conflict > Each of these sequence blocks does indeed occur only 
once, it is just that their positions, relative to one another , 
in strand B are obscured by the presence of conflicting informa- 
tion frons the relative positions of oligos that occur in strand 
k. This ambiguity (indicated by the identical positions of 
sequence blocks "CTXG" and. "TACC" in the diagram to the left of 
the arrow in Figure lie) is resolved by the remainder of the 
information, The positions of those sequence blocks that can 
potentially occur more than once in the sequence of strand B is 
determined from other block sets. First, the Mock sets of the 
sequence blocks that definitely occur only once in the sequence 
(namely, block sets "crrc**, *GTCC f % and »T&cc n ) are consulted. 
The range of positions at which these other sequence blocks can 
occur (relative to the positions of other blocks) is indicated in 
the diagram to the left side of the arrow in Figure, lie, 

The assembly of the nucleotide sequence of Strand B proceeds 
as follows t Sf ATG* is upstream of all other blocks. The uniquely 
occurring Mock immediately downstream of "&TG W is * f GTCC«» ,s ^TsS 9f 
and "QTCC** cannot be directly joined. However, M ATG M can be 
directly joined to so the correct order is to join «atg» 

to "TGGC M , ancl then to join "TGGC* to *»GTCC». neither "CTTG" nor 
M T&CC W can be directly joined to *&TC£ %t . Three different 
sequence blocks can toe used to bridge this gap (namely, 5, CCT/ !, f 
**GTA Sf , and "TGOT" } » The only combination of these three sequence 
blocks that can fill this gap is W CCT ! * alone, which bridges the 
gap between *GTCC« and »CTT6 M . This resolves the ambiguity as to 
the relative positions of »CTTG M and «TACC K « »CTTG« is therefore 
upstream of "TACC*. "Crre" cannot be directly joined to *TACC M * 
Again,, there are three different sequence blocks that can be used 
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to fill this gap (naaely, W CCT W , "QTA", and "TGGT iS ) . The only 
combination of these three sequence blocks that can fill this gap 
is "TGGT* and »GT& W (i„e,, w GTTG w Is joined to "TGGT", "TGST* is 
joined to "OTA**, and "GTA" is joined to '"EACC") . And finally, 
K CfA w , which occurs upstrea.m at nil other blocks, must be 
included in the sequence. However, M TACC" cannot foe directly- 
joined to w CTA tt . There are three different sequence blocks that 
can be used to fill this gap (namely , W CCT W , * f GTA", and «TGGT B ) . 
■The only combination of these three sequence blocks that can fill 
this gap is "CCT" alone. Thus, the assembly of the sequence of 
Strand B from its sequence blocks is completed. Mots that soma 
sequence blocks that could potentially occur in the sequence more 
than ones, actually occur only once (e.g., **GTA* f ) , while others 
actually occur more than ones (e.g., **CCT M ) . 

Using the methods of this invention, the entire sequence of 
strand B is unambiguously determined, despite the fact that soma 
oligos occur more than once in its sequence > despite the fact 
that more than one sequence block can he assembled from the 
oiigos that occur in the strand, despite the fact that the 
multiplicity of- occurrence of each oligo is not determined during 
surveying, despite the fact that the strand is analyzed in a 
mixture of strands, and despite the fact that the other strand in 
the miseture possesses many of the same oiigos* 

5. Uses of sectioned oligonucleotide arrays for 
manlpulating nucleic acids — 

In the examples described below, it is assumed that the 
sequences of the nucleic acids to be manipulated have already 
been established » It is not necessary, in these manipulations, 
that the sample be distributed across the entire array. Instead, - 
a. sample can he delivered directly to the well in the array where 
a. particular oligo (or a particular strand) is immobilised » The * 
arrays enable a large number of specifically directed manipula- 
tions of nucleic acids to be carried out. 
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5.1. Cleavable primers — 

Amplification of strands and partialis following separation 
(or generation) on a sectioned array requires that their ends be 
provided with pricing regions. The priming regions can be 
undesirable in subsequent use, such ass the making of recombinants 
or site-directed mutants. For some uses it is desirable to 
substitute new priming regions for the old. For those uses, the 
primers used for amplification must first be rexaoved from the 5' 
ends. 

Where the junction of. the primer and the strand is contained 
within a unique restriction site., the primer can toe removed by 
treating a double-stranded version of the strand with a cor- 
responding restriction endonuelease- However, restriction sites 
will often not be present at the junctions. A solution to this 
problem is to make the primer (or even only the junction nucleo- 
tide in the primer) ehesue&ily different from the rest of the 
strand. The primer in these examples resides at the strand *s 5' 
terminus . 

5.1.1. Cleavage of primers by alkaline hydrolysis or by 
ribonuclease digestion — 

This method is suitable for removal of oligoribonucleotide 
primers, or mixed primers whose 3 s terminal nucleotide 

(which becomes a junction nucleotide upon primer extension) is a 
ribonucleotide. Such primers are incorporated at the 5 s end of 
DR& strands or p&rtials during amplification* 

Alkaline hydrolysis cleaves a phosphodiestar bond that is on 
the 3* side of a ribonucleotide,, and loaves intact a phospho- 
diester bond that is on the 3 f side of a deoxyribonwclsotide. 
After alkaline hydrolysis, the pH of the reaction mixture is 
returned to a neutral value by the addition of acid, and the 
sample can be used without pur if ication. Primers containing a 
rihoadenyiate or a r ibognanylate residue at their 3* end can 
effectively he removed from a DNA strand or partial by treatment 
with T> ribonuclease. After treatment f the sample is heated to 
lOO^C to inactivate the ribonuclease, and can be used without 
purification. In both these oases f the released 5 4 terminus of 
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the strand (or partial) is Left; dephosphorylated. Therefore, if 
the strand obtained is subsequently used for ligation, it should 
be phosphorylated by incubation with polynucleotide Kinase* 

5.1.2. cleavage of primers from DHA strands (or partials) 
synthesized fross phosphor othioat a nucleotide precursors -~ 

In this method, oligodeoxynucleotide or oligoribonucleotide 
primers are synthesized from natural nucleotides , but strand 
amplification is carried out in the presence of only a-phos- 
phorothioate nucleotide precursors. Subsequent digestion of the 
synthesized strands vith a 5* -3* exonuclease, such as calf spleen 
S'-S* exonuclsase, resxilts in the. elimination of all primer 
nucleotides except the original 3 ! -terminal (junction) nucleotide 
of -the priBsr, with the released 5* -terminal group of a strand or 
partial being unphosphorylated* The junction nucleotide is 'not 
removed, because it is joined to the rest of the strand by a 
phosphorothioate di ester bond* Therefore, the strand obtained 
has an extra nucleotide at its 5* end. This does not present a 
problem when the presence of the former junction nucleotide at 
the S* end of the strand is compatible with the subsequent use of 
the strand » The presence of the extra nucleotide can also he 
useful for site-directed mutagenesis. 

If the priBer-aeprived strand so obtained is to be ligated, 
the use of spleen exonucle&se, which leaves 5 * -hydroxy X groups ,< 
must, be -then followed fey phosphorylation with polynucleotide 
kinase. Therefore, where the strand is to be ligated, the use of 
bacteriophage lambda or bacteriophage T7 5' -3* exonucleaoe is 
preferable over spleen exonuclea&e , since they leave 5*-phos~ 
phoryl groups at the site of cleavage, 

5.2. Generation of recombinant nucleic acids — 
In the method described below, two nucleic acid strands are 
ligated in one round of ligation* It is possible, to keep repeat- 
ing the process any desired number of tisses to Xigate the desired 
ninsber of strands. 

In this example, a sectioned array contains isnaobilised 
oligos that consist of two portions ,< one complementary to the 3 s - 
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terminal sequence of one of the moieties to toe ligated, and the 
other complementary to the 5* -terminal sequence of the other 
society to be li gated. The immobilised oiigos can have either 
free 3* ox" 5' ends. The relevant termini of the moieties to foe 
ligated should be deprived of priming regions, but priming 
regions (preferably different) should be preserved at the 
opposite termini to allow amplification of the recombinants - 
After hybridisation in an appropriate well, the two nucleic acid 
strands are ligated to each ether utilising DNA ligase* 
Unligated strands are then washed away. Only ligated strands 
possess two terminal pricing regions required for PCR. The 
strands that are to he ligated can be used in a mixture with 
other strands { provided that no other strands have with the same 
oiigos at the tern-Ini deprived of priming regions. 

Many different strands can foe ligated to one particular 
strand {or partial), to produce many recombinant variations of 
one gene. In that case, one portion of the splint,, i.e.,. the 
immobilised oligo is a constant segment, and the other portion is 
a variable segment, i.e., a binary array is used. The constant 
segment binds to the strand to foe included in every recombinant, 
and the variable segment binds to the end of a strand to be fused 
with the invariant strand. 

S.3. Site-directed mutagenesis — 

The ability to prepare any partial of a strand according to 
the invention provides the opportunity to make nucleotide sub- 
stituti.ons, deletions and insert iox^s at any chosen position 
within a nucleic acid. Moreover, the use of sectioned arrays 
makes it possible to perform site-directed mutagenesis at a 
number of positions (even at all positions) at once, and in a 
particular embodiment, to determine, within individual wells of 
the array, properties of the encoded mutant proteins. 

Mutations are introduced into a strand by first preparing 
partials having variable ends that correspond to the segment to 
be mutated, that segment preceding the location of the intexided 
mutation. Then mutagenic nucleotides or oligos are introduced 
into the variable ends. The mutated partials are then extended 
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the length of the full sised strand using the complementary copy 
of the original no.n~tnutat.ed strand as a template, 

Xn this method t complements of partial© (i.e., strands whose 
5* termini are variable, and 3* termini are fixed) are used* 
Their 5* -terminal priming regions are removed and then ph©s~ 
phorylated by incubation with polynucleotide kinase, and the 
partialis are then ligated .by incubation with Wih ligase to the 
free 3 s hydroxyls of oligoribonucleotides isuaobiliaed on a 3 * 
sectioned ordinary array. The sequence of the immobilized oli.go 
to which a partial is ligated is identical to the oiigo segment 
that occurs in the original (full- length) strand ims-ed lately 
adjacent to the end of the partial, except for one (or a few) 
nucleotide differences) that corresponds to mutation (s) to be 
introduced «, 

The nucleotide differences are preferably located at the 3* 
terminus of the immobilized oiigo, and can correspond to a 
nucleotide substitution,, insertion f or deletion. A deletion can 
be of any size. For a large insertion, the ligated partial f or 
the immobilised oiigo, can first be fused to a nucleic acid 
containing all or part of the sequence to be inserted. 

After washing away material not covalently bound, the 
immobilized strand is linearly copied, talcing advantage of the 
pricing region at its (fixed) 3 f end. The copies correspond to 
partials that have been extended by the oligos containing the 
mutation(s) . The copies are annealed to their complementary 
full-length strands s and their 3* termini extended by inctsbation 
with DKA polymerase, using the parental strand as a template* 
Finally, the extended mutant strands are amplified by PCR. it is 
important that the primers utilised for amplification of a 
partial used for mutagenesis be different from the primers used 
to amplify the original (non-mutant) full-length strand. This 
assures that only taut ant strands are amplified. 
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We claim I 

1, A .binary oligonucleotide array comprising an array of 
l^edetensined areas on a surface of a solid support, each area 
having therein, ccvalantly linked to said surface f multiple 
copies of a binary oligonucleotide of a predetermined sequence f 
said binary oligonucleotide consisting of a constant nucleotide 
sequence adjacent to a variable nucleotide sequence, wherein the 
constant nucleotide sequence is the same for all oligonucleotides 
in the array. 

2, A binary array according to claim 1 wherein the binary 
oligonucleotides consist of Qeoxyribonncleotides, 

3, A binary array according to claim 1 wherein the binary 
oligonucleotides consist of ribonucleotides* 

4, binary array according to claim I therein one or more of 
.nucleotides of the binary oligonucleotides are modified, 

5, A binary array according to claim 1 wherein one or acre of. 
the nucleotide© of the binary oligonucleotides are non-standard, 

6, A binary array according to claim 1 wherein tbe binary 
oligonucleotides are mixed <, 

?♦ A comprehensive binary array according to claiis X 

8, A comprehensive binary array according to claim 7 wherein tbe 
binary oligonucleotides in each, area nave variable sequences of 
tbe same length. 

A3* binary array according to ciaisi 1. 

I0» A 5* binary array according to claim X» 
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11. A 3 s binary array according to claisi 9, wherein each 
eovalently linked binary oligonucleotide has its constant 
sequence adjacent to the 5 s end of its variable saguence. 

12. ?!. 5' binary array according to claim 10 f wherein each 
eovalently linked binary oligonucleotide has its constant 
sequence adjacent to the 3* end of its variable sequence. 

13. h binary array according to claisa 2 wherein all or part of 
the constant nucleotide sequence is complementary to a predeter- 
mined restriction recognition sequence, 

14. A binary array according to claim l having an oligo- 
nucleotide hybridised to all or part of the constant, sequence 
which is ligatable to the terminus of an adjacent nucleic acid 
hybridised to the oligonucleotide. 

15. In an oligonucleotide array having variable-sequence oligo- 
nucleotides immobilised in a predetermined pattern of areas on a 
solid support, the improvement comprising including in said 
oligonucleotides a constant sequence of predetermined length. 

16. A sectioned binary array according to claim 1. 

17.. A comprehensive sectioned binary array according to claim 
16. 

18. A3' binary oligonucleotide array according to claia 17 f 
wherein each eovalently linked, binary oligonucleotide has its 
variable sequence adjacent to the S* end of its constant 
sequence* 

19. & 5* binary oligonucleotide array according to claim 17, 
wherein each, eovalently linked binary oligonucleotide has its 
variable sequence adjacent to the 3* end of its constant . 
sequence. 
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20. A binary oligonucleotide array according to claim I, wherein 
said constant rvucl eoti.de ssguence comprises one or store func- 
tional sequences selected from the group consisting of & nucleic 
acid polymerase priming region, an B&& polymerase promoter 
region , and a restriction endonuclease recognition site* 

21 . A binary oligonucleotide array according to claim 20, 
wherein said functional sequence is a priming region, 

22. A binary oligonucleotide array according to claim i, wherein 
each binary oligonucleotide is covaiexrtly linked to said surface 
through a long polymer chain. 

23. A binary oligonucleotide, according to claim 2, wherein said 
deoxyribannciaotidas comprise at least one modified nucleotide, 

24. A sectioned oligonucleotide array comprising an array of 
predetermined areas on a surface of a solid support, each area 
having therein, eovalantly lixiked to said surface multiple copies 
of an oligonucleotide, wherein said areas are physically separ- 
ated from one another into sections, such that nucleic acids in 
an aqueous solution generated in one section cannot migrate to 
another section. 

25. & sectioned oligonucleotide array according to claim 24 
further comprising a lattice attached to said surface* 

26. A sectioned oligonucleotide array according to claim 25, 
wherein said lattice is removably attached to said surface, 

27. A sectioned oligonucleotide array according to claim 2$, 
further comprising a cover removably attachable to said lattice, 

28. & sectioned oligonucleotide array according f.o claim 24 f 
wherein said sections comprise wells in said, solid support. 
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29, A sectioned oligonucleotide array according to claim SB, 
further- comprising a cover removably attachable to said solid 
support. 

30, A sectioned oligonucleotide array according to claim 24, 
comprising a gel which physically separates said areas by prey*' 
anting nucleic acids In an aqueous solution placed in one area 
fros migrating to another area, 

31, A sectioned oligonucleotide array according to claims 24, 
wherein said sections are mechanically separated from one 
another, 

32, A sectioned oligonucleotide array according to claim 27, 
wherein said cover comprises a replica array* 

33, A sectioned oligonucleotide array according to claim 29 , 
wherein said cover comprises a replica array. 

34, A sectioned array according to claim 24 wherein all of the 
oligonucleotides in individual areas are of the same sequence » 

35, A sectioned array according to claim 24 wherein not ail 
oligonucleotides in each area are of the same sequence, 

36 » A method of sorting a fixture of nucleic acid strands 
comprising the steps of i 

a) providing a solution containing a laixtnre of nucleic 
acid strands in single-stranded form and 

to) contacting said solution to a first binary oligo- 
nucleotide array of predetermined areas on a surface of a solid 
support , each area having therein,, covalently linked to said 
surface, copies of a binary oligonucleotide, said binary oligo- 
nucleotide consisting of a constant nucleotide sequence adjacent 
to a variable nucleotide sequence, wherein the constant nudeo- 
tide sequence is the same for all oligonucleotides in the array, 
wherein said step of contacting is carried out under conditions 
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promot ing perfect hybridisation of said strands to said binary 
o.l igonueieotidss . 

37. A method according to claits 36 wherein said array is com- 
prehensive* 

38. A method according to claims 36 wherein said array is a 3* 
array* 

35- A method according to claim 36 wherein said binary oligo- 
nucleotides are complementary to sequences that possibly occur in 
the strands in said mixture < 

40 > A ssethod according to claim 39 wherein said array is com- 
prehensive* 

4.1. A method according to claims 36 wherein said array is a 
sectioned array, further comprising the step of amplifying 
strands hybridized in at least soxae of said areas to produce 
copies of said hybridized strands. 

43* h method according to claim 36 further comprising removing 
strands that have not perfectly hybridised* 

43* h method according to claim 42 further comprising adding a 
tern-inai extension to at least one terminus of the strands f said 
terminal extension having a sequence which substantially does not 
occur in the strands* 

44* A method according to claim 43 wherein a terminal extension 
is added to the strands by ligation of hybridized strands to 
masking oligonucleotides, said ssasXxng oligonucleotides toeing 
also hybridised to said binary oligonucleotides . 

45. A method according to claixa 44 wherein a second terminal 
extension is added to the strands prior to said step of con- 
tacting, said second terminal extension being added to termini 
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not hybridised to said binary oligonucleotides during said step 
of contacting, 

46, A method according to claim 42 further comprising releasing 
.hybridised strands on a sectioned array into solution without 
fixing of xsaterial in said areas and reminding them to said 
binary oligonucleotides followed by removing nnhyhridised 
strands « 

47. A method according to ola..i& 42 further comprising releasing 
hybridised strands in solution and reminding -to a replica array 
followed by removing unhybridised strands, 

4S, A method according to claim 42 wherein the mixture of 
nucleic acid strands comprise© KK&, 

49 » A method according to claim 42 wherein the mi suture of 
nucleic acid strands is comprised of DN& fragments obtained by 
site specific degradation, 

50, A method according to claim 43 wherein the mixture is 
comprised of DMA fragments obtained by digestion with a- restric- 
tion endonuclease and wherein the constant region of the binary 
oligonucleotide contains the complement of the restriction 
endonucleasa recognition site, and wherein addition of the 
terminal extension restores the recognition site, 

51. A method according to claim 42 further comprising generating 
complementary copies of hybridized strands v 

52 v A method according to claim 31 wherein the array is a 3 * 
array wherein each binary oligonucleotide has its variable 
sequence adjacent to the 5 s end of its constant sequence, and the 
copies are generated using a DNA polymerase and using the binary 
oligonucleotide as a primer. 
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53. A method according to claim 51 therein the array is a S f 
array wherein each binary oligonucleotide has its variable 
sequence adjacent to the 3* end of its const-ant sequence f and the 
copies are generated using a tMh polymerase using a primer 
hybridized to a 3* terminal extension of' the hybridised strands, 
and the copies are then ligated to the 5 * end of the binary 
oligonucleotides . 

54. A method according to claisa 44 further comprising amplifying 
the hybridised strands. 

55. ^ method according to claim 51 further comprising removing 
the hybridised strands and amplifying the complementary copies of 
the hybridized strands. 

SS. A method according to claim 55 wherein the hybridised 
strands have 3* arid 5* terminal extensions, and the amplification 
is a polymerase chain reaction* 

57, A method according to claim 55 wherein the hybridised 
strands have a terminal extension and the amplification is 
linear* 

58, A method according to claim 36 wherein said step of pro- 
viding comprises digesting genomic DMA with a restriction endo™ 
nuolease to create DMA fragments; 

(a) modifying said fragments by adding a. first constant 
sequence to their strands* 3 s termini and a second constant 
sequence to their strands* 5 8 termini to create priming regions 
including restored restriction sites; and 

(b) denaturing the modified fragments to form a mixture of 
single nucleic acid strands > 

59, A method according to claim 58 wherein said array is a 
sectioned, comprehensive array, further comprising the step of 
amplifying strands hybridised in said areas by symmetric PCH. 
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60. A method according to claim 58 further comprising the step 
of amplifying said mixture of single nucleic sold strands by 
asymmetric PCS*. 

61. A method according to claim 36 wherein said binary oligo- 
nucleotides or portions thereof are complementary to terminal 
sequences that possibly occur in one end of the strands in said 
mixture and that are substantially non~complamentary to internal 
sequences in the strands in said mixture, 

62. & saethod according to claim 61 therein said array is a 
sectioned array, further comprising the step of amplifying 
strands hybridised, in at least soma of said areas to produce 
amplified copies of said single nucleic acid strands, 

63 . A method according to claim 62 wherein said array is a 
comprehensive array. 

64. A method according to claim 62 wherein said array is a 3* 
array » 

6S« A method according to claim Si wherein said step of provid- 
ing comprises digesting genomic BNA with a restriction endo- 
nuclease to create. B&A fragments, modifying said fragments by 
adding a first constant sequence to their strands' 3* termini to 
create priming regions including restored restriction sites, and 
denaturing the modified fragments into a mixture of single 
nucleic acid strands. 

66. A method according to claim 61 wherein said step of provid- 
ing comprises digesting genomic DN& with a restriction endo™ 
nuclease to create DHA fragments; 

(a) modifying said fragments by adding a first constant 
segment to one of their strands * 3 ' and S * termini to create 
priming regions including restored restriction sites j and 
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(b) denaturing the modified fragments into a mixture of 
denatured nucleic acid strands each having a priming region only 
at one end* 

&?♦ A method according to claisa 66 wherein said first binary 
sorting array is a 3 * array* 

68 > A method according to claim S7 farther comprising the steps 
of 

(a) generating an immoh.iii.gad copy of each strand hybrid- 
ised to the array by incubation with a DN& polymerase using the 
immobilised oligonucleotide as a primer and a hybridised strand 
as a template; and 

(fo) washing to remove from, the array all materials not 
covalently bound to the array. 

69 , A method according to claiss 68, wherein said step of modify- 
ing comprises adding a first constant sequence to their strands* 
5 s termini and wherein said 3 ! array contains binary oligo~ 
nucleotides to which are hybridized maskixig oligonucleotides, 
further comprising the steps of 

(a) ligating said masking oligonucleotides to denatured 
nucleic acid strands hybridised to said binary oligonucleotides 
such that their 3 ? terrdni are immediately adjacent to one of 
said masking oligonucleotides, and 

(b) washing under conditions such that only strands so 
1 igated will remain. 

?D» A method according to claim £9 wherein said step of adding a 
first constant sequence includes ligation of a double^stranded 
oligodeoxyribonucieotide adaptor , 

71* A method according to claim €3 wherein said step of adding a 
first constant sequence includes ligation of a single-stranded 
oiigoribonuoleotide. 
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72. A method according to claim 6S wherein said step of modify- 
ing comprises adding a first constant seguence to their strands' 
3 5 termini. 

73* A aethod according to claim 72 wherein said first constant 
sequence is a homopol nucleotide tail added by extension of the 
stranels* 3* termini by enzymatic e&tension, 

74, A method according to claims 72 further comprising the step 
of adding a second constant sequence to the 3 * termini of the 
hsmob ii i sed copi es . 

75, .a method according to claim 74 wherein said second constant 
seguenee is a hossopolynucleotide tail added toy extension of said 
immobilised copies* 3 s termini by ensymatic extension. 

76. A method according 1 to claim 68 wherein said first binary 
oligonucleotide array is a sectioned array,, further comprising 
the step of amplifying said washed, immobilised copies to produce 
amplified copies, 

77. A method according to claim 76 wherein said step of amplify- 
ing comprises PCS, 

78 > A nsethod according to claim 76 wherein said first binary 
oligonucleotide array is a comprehensive array. 

79. A method according to claim 76 further comprising contacting 
said amplified copies from at least one area of said 3 f array to 
a second binary oligonucleotide array containing immobilised 
binary oligonucleotides whose constant seguenee is identical or 
complementary to the 3 ! terminus of the immobilized copies, 

SO, A method according to claim 62 further comprising contacting 
said amplified copies from at least, one area of said first binary 
oiigonuoi ootids array to a second binary oligonucleotide array 
containing immobilized binary oligonucleotides that are com-" 
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pleiservbary to terminal sequences that possibly occur in either 
the other ends of said denatured nucleic acid strands or the 
complements of said other ends,, and that are not complementary to 
internal sequences in the strands in said mixture or their 
complements , 

81, A method according to claim 61 wherein said step of provid- 
ing comprises digesting genomic DNA with a restriction endo~ 
nuclease to create Dh r A fragments t and denaturing said fragments 
into a mixture of denatured nucleic acid strands, 

82* A method according to claisn 81 where in said .first binary 
oligonucleotide array is a 3 * array containing binary oligo- 
nucleotides to which are hybridised masking oligonucleotides,, 
further comprising the steps of ligating said masking oligo- 
nucleotides to denatured nucleic acid strands hybridized to said 
binary oligonucleotides such that their 3 s termini are immedi- 
ately adjacent to one of said masking oligonucleotides, washing 
under conditions such that only strands so ligated will remain, 
and generating an immobilized copy of each ligated strand by 
incubation with a DNA polymerase x 

83. A method according to claim 82 further comprising the steps 
of adding a constant sequence to the 5* termini of the hybridised 
strands by ligation of a single-stranded oligorifoonucleotide; 
incubating with a DH& polymerase to extend the immobilized 
copies; washing to remove from the array ail materials not 
covaiently hound to the array; and amplifying said washed, 
immobilized copies to produce amplified copies. 

84* A method according to claim 83 wherein said step of amplify- 
ing comprises 

85 * A method according to claim; 83 wherein said first sorting 
array is a comprehensive array. 
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86- A method according to claim S3 further comprising contacting 
said amplified copies from at least one area of said 3* array to 
a second binary array containing i-smofcilized binary oligonucleo- 
tides ^hose constant sequence is identical or complementary to 
the 3* terminus of said immobilised copies. 

87, A ssethod according to claim 67 further comprising the stops 
of adding a constant sequence to the 3* termini of the isamobi- 
iized copies by ensymatie extension thereof; washing to remove 
from the array all materials not covalently bound to the array; 
and amplifying said washed, immobilised copies to produce ampli- 
fied copies,. 

S8. A suetaod according to claim 8'? wherein said step of amplify- 
ing comprises FCE*. 

39 « A method according to claim 8? wherein said first sorting 
array is a comprehensive array, 

SO, A method according to claim B7 further comprising contacting 
said amplified copies fross at least one area of said 3 s array to 
a second terminal binary array containing imofoilised binary 
oligonucleotides whose constant sequence is identical or com- 
plementary to the 3' terminus of said immobilised copies, 

91, A method according to claim 6X wherein said step of provid- 
ing comprises digesting genomic DMA with a site-specific cleaving 
agent to create PHA fragments, 

92, & method, according to claim 91 wherein said agent is an 
sndonucl ease , 

S3. A method according' to claim 31 wherein said agent is a 
chemical agent* 

94. a method according to claim 61 wherein said nucleic acid 
strands are eDKS, strands. 
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9$. A method according to claim 61. therein said, nucleic acid 
strands are strands. 

96. A method according to claim 95 wherein said RNA strands are 
eukaryotio tnKNA strands, and wherein said step of providing 
comprises; removing 5 s -cap structures* 

97. A method according to claim 95 wherein said jRNA strands lack 
a poly (A) tail. 

98. A method according to claim 61. wherein said step of provid- 
ing comprises digesting genomic DNA with a restriction endo- 
nuclease to create DSA fragments; 

(a) modi tying said fragments by adding a first constant 
sequence to their strands* 3 s terrain! and a second constant 
sequence to their strands* S* tersdni to create priming regions 
including restored restriction sites; and 

(b) denaturing the modified fragments into a mixture of 
single nucleic acid strands. 

SB, A method according to claim 98 wherein the 3* priming 
regions aire complementary to the 5* prising regions . 

100, A method according to claiB 99 wherein said array is a 3 1 
array, further comprising the steps of 

(a) generating au iTmsobilised copy of each strand hybrid- 
ised to the array fey incubation with a D8A polymerase; and 

(&) washing to remove froia the array all materials not 
covalentxy bound to the array. 

101. A ssethod according to claim ioo wherein said array is a 
sectioned array., further comprising the step of amplifying 
stxands hybridised in at least some areas by FCR to produce 
amplified copies of each said immobilized copy, 
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102 > method according to claim 1 01 wherein said array is a 
oosrsprehens I. va array , 

103, A method according to claim 99 wherein addition of said 
first constant sequence and said second constant sequence 
includes ligation of a doubl&~stranded oligodeoxyrihonucleo-tide 
adaptor to the strands* 5 ! termini. 

104 . A method according to claim 99 wherein addition of said 
first constant sequence and said second constant sequence 
includes ligation of a s. ingle-stranded oligonucleotide to the 
strands* 5 s termini.* 

iOSx A method according to claims 99 wherein addition of said 
first constant sequence and said second constant sequence 
includes ensymatic extension of the strands* 3* termini by the 
synthesis of a homopoXynaeleotide tail, 

108 > A method according to claim 101 further comprising contact- 
ing said amplified copies from at least one areas of said 3 s 
array to a second binary array under conditions promoting hybrid- 
issati.on of said amplified copies to the binary oligonucleotides 
in said second array. 

A method according to claim 106 wherein said amplified 
copies aro produced fey symmetric FCR and whorein said, second 
array is a 3 * array, 

IDS. A method according to claim 106 wherein said first array and 
said second array are comprehensive, 

109. The product of a method according to claim 100, 

110. A method of sorting a mixture of nucleic acid strands 
comprising the steps of 

a) providing a solution containing a mixture ox xiueieic 
acid strands in single stranded form, and 



fo) contacting said solution to an oligonucleotide array of 
predetermined areas or* at surface of a solid support, each area 
having therein copies of an immobilised oligonacleoti.de, the 
nucleotide sequence of immobilised oligonucleotides in separate 
areas being different,, wherein said contacting is performed under 
conditions that promote the formation of perfect hybrids < 

111. A method according to cXaim 110 wherein said array is 
coatpr ehfi»s ive » 

112 . A method according to claim Hi) wherein the array is 
sectioned . 



1X3 . A method according to cXalm 110 wherein the immobilised 
oligonucleotides are between 6 and 3D nucleotides long* 

114. A method according to claim 110 wherein the array is a 3* 
array * 

115. A method according to dais 1X0 wherein the array is a 5* 
array x 



116 » In a method wherein two nucleic acid strands are ligated to 
each other in order to forsi a recombinant product t the improve-" 
ment comprising hybridising first nucleic acid strands to 
immobilised oligonucleotides in an oligonucleotide array prior to 
ligation to second nucleic acid strands, said oligonucleotide 
array comprising an array of predetermix^ed areas on a surface of 
a solid support, each area having copies of an oligonucleotide 
immobilised thereon. 



11? x A method according to claim XI 6 therein the first nucleic 
acid strands have different nucleotide sequences in each area of 
the array* 
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118. h method according to claim 116 wherein the second nucleic 
acid strand© have different nucleotide sequences in each area of 
the array. 

119. A method according to claim 116 wherein the array is a 
coapreh.ansi.ve array * 

120. A method according to claim 116 wherein the oligonucleotides 
immobilized in each area are oi the same length. 

121. A rsethod according to claim 116 wherein the oligonucleotides' 
consist of the group consisting of deoxyribonucleotides, ribo- 
nucleotides f scd&ed deoxyribonucleotid.es and ribonucleotides, 
modified deoxyribonucleotides,, modified ribonucleotides, and non~ 
standard nucleotides. 



122 > A method according to claim 116 wherein the second nucleic 
acid strands are not also hybridized to the immobilised oligo- 
nucleotides X 



123. A sethod according to claim IS 2 wherein the. second nuc leic 
acid strands are strands of double stranded nucleic acids, 



124 > A method according to clax® 123 wherein the set of double 
stranded nucleic acids has one end adapted for ligation to blunt 
ends formed by hybridization of the first nucleic acids to the 
i^ofoilised oligonucleotides . 

125, A method according to claim 116 wherein non-1 igating termini 
of the first nucleic acid strands and the double stranded nucleic 
acids contain priming regions for amplification. 

126. A method according to eiaiti 125 wherein following ligation 
of the first, nucleic acids to the second nucleic acids, poly- 
merase chain reaction amplification is carried out. 
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127, A method according to claims 124 wherein the double stranded 
nucleic acids are Xigated to the i&BOfoilissed oligonucleotide 
using &HA ligase prior to ligation of the first nucleic acid 
strands and the second nucleic acid strands, 

128. A method according to claiia 123 wherein the second set of 
nucleic acids is the same in every area of array, 

123. A method according to daiis 123 wherein the first nucleic 
strands are hybridised to the isasobilised oligonucleotides while 
contained in a fixture of one or raore different strands, said 
different strands having terminal sequences different from 
corresponding termini to he Xi gated of the first nucleic acid 
strands * 

130. A Bethod according tc claim 116 wherein both the first 
nucleic acid strands and the second .nucleic acid strands are 
.hybrid.! sed to the immobilised oligonucleotides in the array prior 
to ligation. 

131. A method according to claim 13 0 wherein both the first and 
second nucleic acid strands contain pricing regions at. their non~ 
I igating teraini « 

132. A method according to claim 131 wherein the first and 
second nucleic acid strands are amplified in a polymerase chain 
reaction following ligation. 

133. A method according to claiss 130 wherein both the first and 
second nucleic acids are, prior to hybridisation to the immohi~ 
lissed oligonucleotides t contained in fixtures of nucleic acids 
having terminal sequences different frosts the corresponding 
termini to he ligated of the first nucleic acid strands and the 
second nucleic acid strands. 

134. A method according to claim 36 further comprising sorting 
the hybridised nucleic acid strands or their copies in an area of 
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th.® f irst: Mssary array &y contacting- tfasm to a second oiigo- 
nucleotide array. 

135. A Bathed according to claim 134 wherein the strands or their 
copies are contacted to all areas of the array. 

136. A method according to claim 36 wherein the nucleic acid 
strands are. contacted, to all areas of a second binary array. 

137. A method according to claim 134 wherein cleavable primers 
are used following said step of contacting for amplification of 
hybridised strands . 

13 u. A method according to claim 13? further comprising cleaving 
the claavable primers from the strands and adding new terminal 
ext ensions * 

139* A method according to claim 134 wherein the contents of an 
area of the first binary array are contacted with only predeter- 
mined areas of a second binary array. 

140 * A method according to claim 36 further wherein contents in 
an area of the binary array are contacted with the corresponding 
area of a replica array. 

14 lx A method according to claiss 134 wherein the second oligo- . 
nucleotide array is a second binary array, 

142 * A method for introducing a site directed mutation into a 
nucleic acid strand on an oligonucleotide array using a partial f 
said partial corresponding to a region of the nucleic acid strand 
adjacent to the location of the site directed mutation to be 
introduced, comprising the steps; 

(a) separately ligating said partial to the tree terminus 
of a preselected immobilised oligonucleotide in the oligo- 
nucleotide array to obtain a mutated partial, said oligonucleo- 
tide array comprising an array o* predetermined areas on the 
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surface of a solid support, each area having therein a pre~ 
selected immobilised oligonucleotide, said preselected eUgo- 
nucleotide having a seguence adapted to introduce a mutation to 
the partial added tc the area; and 

(b) generating, using the ssutated partial, a nucleic acid 
containing the mutation* 

143 * A method according to ei&isa 142 wherein step b is 
accomplished by 

(a) hybridizing a complementary copy of the imitated partial, 
to a template having the complementary sequence of the terminal 
portion of the nucleic acid strand which is not contained in the 
partial? and 

(fe) carrying out a polymerase reaction, a ligation reaction 
or hath a polymerase reaction and ligation reaction to join the 
remaining region of the nnclaic acid strand to the mutated 
partial , 

144, A method for staking immobilised partial copies of a nucleic 
acid strand on a 3* or 5' oligonucleotide array, comprising the 
steps ; 

(a) hybridising the strand to the array by an oligo- 
nucleotide segment contained in the strand, said array comprising 
predetermined areas on a sm~face of a solid support, each area 
having therein immobilised oligonucleotides consisting of a 
predetermined variable sequence, said hybridisation taking place 
under conditions that promote the formation of perfect hybrids of 
the length of the immobilised oligonucleotide in each area, and 

(b) where the strand is hybridized to a 3* array, enzymati- 
cally extending the. immobilised oligonucleotide using the hybrid- 
ized strand as a template,, and where the strand is hybridised to 
a 5 f array, hybridising a primer to a priming region contained in 
the 3* terminus of the hybridized strand, then ensymatically 
extending the priaer to form an extension product, then ligating 
the extension product to the immobilised oligonucleotide. 
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145, A method according to claim 144 wherein the strand is 
hybridised to a 3 s array., further comprising amplifying the 
immobilised partial copies using a primer or promoter complement 
appropriate to hybridise to a pricing region or promoter sequence 
at the immobilised partial copies* 3 f terssirsi, and an appropriate 
polymerase ♦ 

146. A method according to claim 144 wherein the oligonucleotide 
array is substantially comprehensive, 

3.47. A method according to claim 14 6 wherein a substantially 
complete set of immobilized, partial copies is generated on the 
array by 

(a) hybridising -the strand to the array .by substantially 
all oligonucleotides present in the strand; 

(b) performing step (b} on all hybridized strands. 

14 3 , A method according to claim 146 wherein a substantially 
complete set oi amplif ied partlals is generated on a 3 * array fey 

(a) hybridizing the strand to the 3' array by substantially 
all oligonucleotides present in the strand; 

(b) performing step (b) on ail hybridised strands? and 

(c) amplifying substantially all immobilised partial copies 
by using a primer or promoter complement appropriate to hybridise 
to a pricing region or promoter sequence. :at the partial copy's 
fixed terminus, and an appropriate polymerase » 

149* A method acoording to claim 14 S wherein following step {a} 
unhyhridized and imperfectly hybridized strand copies are 
removed v 

ISO. A method according to claim 143 wherein the array Is 
sectioned , 

15.lv A method accox-ding to claim ISO wherein the strand is 
contained In a mixture of strands which are subjected to the same 
steps on. the array* 
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152. A method according to claim 151 wherein the pricing region 
is a terminal extension introduced in all strands in the fixture, 

153. A method according to claixs 149 wherein the px-iming region 
or promoter is added to the 5* terminus of the nucleic acid 
strand prior to hybridising the strand to the array, 

154. A method according to claisi 150 further wherein the oligo- 
nucleotide content in an area of the array is surveyed. 

155 > The product ox. a method according to claim 14 4. 

156, The product of a method according to elaiin 146. 

157, A method according to claim 144 wherein the strand is 
contained in a mixture of sorted strands subjected to the method, 
said mixture of sorted strands being frors an area of a sorting 
array » 

158, & method according to claims IS? further wherein mixtures of 
strands fx-om different areas of the sorting oligonucleotide array 
are hybridised to the 3* or 5* oligonucleotide array. 

1S9 v ft method according to claim 144 wherein the nucleic acid is 
a previously prepared partial. 

160. A method according to claim 145 further comprising sorting 
pax'tials or their copies from an area of the oligonucleotide 
array on a second oligonucleotide array. 

161. K method according to claim 1.45 further comprising sorting 
partials or their copies from an area of the oligonucleotide 
array according to variable sequences adjacent their fixed ends 
on a binary oligonucleotide array*. 
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162, A method of claim 144 further comprising Xigating a partial 
or its copy in single stranded or double stranded form to a 
second nucleic acid strand . 

163, h method according to claim 162 wherein the second nucleic 
acid strand is a previously obtained partial, 

164 , A method according to claim 145 further wherein a eleavable 
primer, at an end of a partial to he ligated, i.s used for ampli- 
fication, and further comprising cleaving the primer and then 
ligating the partial to a second nucleic acid strand* 

165, A method according to claim 162 further comprising exponen- 
tially amplifying ligated product using priming regions at non~- 
ligated termini. 

166, A method according to claim 165 further wherein the px^iming 
regions at the non~ligated termini of the ligated product are 
adapted to permit amplification only of the ligated product. 

16?, A method according to claim 144 further wherein a partial 
obtained is ligated to an oligonucleotide or to a second nucleic 
acid strand adapted to introduce a site directed mutation, with 
respect to the nucleic acid strand that the partial was generated 
from, at the ligated terssihus of the partial, 

168. A method according to claim 167 wherein the oligonucleotide 
is immobilised, in a second oligonucleotide array. 

1.69. A method for sorting partials hy their variable termini on a 
binary oligonucleotide array, which partial® have been prepared 
by random chemical or enzymatic degradation of one or more 
nucleic acid strands, said binary array comprising an array of 
predetermined areas on a surface of a solid support, each area 
having therein copi.es of a binary oligonucleotide of a predeter- 
mined sequence, said binary oligonucleotide consisting of a 
constant nu.cieoti.de sequence adjacent to a variable nucleotide 
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ssquenee, said variable nucleotide. seq\ience being at the free end 
of the binary oligonucleotides, said binary oligonucleotide also 
having a complementary Masking oligonucleotide hybridized to all 
or a part of the constant nucleotide sequence , including the 
portion of the constant nucleotide sequence adjacent the variable 
nucleotide sequence, comprising the steps of: 

(a) hybridizing the partials to the array by their termini 
under conditions that promote the formation of perfect hybrids; 
and 

(b) ligating the termini of the partials to the masking 
ol igonucieotide ♦ 

170. A method for obtaining information for determining the 
sequence of a nucleic acid strand comprising 

(a) generating a substantially eoisplete set of partials of 
the nucleic acid strand; and 

(b) for groups of partials, having the same terminal 
variable nucleotide sequence of predetermined length, separately 
determining the presence and sequence of all variable oligo- 
nucleotides of the predetermined length. 

171 > In a method for surveying oligonucleotide content of a 
nucleic acid strand as part of a sequencing method wherein the 
strand is hybridised to a comprehensive oligonucleotide array, 
and the presence of hybridised strands in areas of the array is 
detected, the improvement comprising; 

(a) preparing a substantially complete set of partials of 
the strand prior to surveying; 

(b) sorting the partials by their variable ends on an 
oligonucleotide array, and 

(c) separately surveying oligonucleotide content of each 
group of sorted partials., 

172, A method according to claim 171 wherein the strand is in a 
mixture of strands which are subjected to the same steps. 
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173 „ A method according to claim 173 wherein the substantially 
complete set or" partial s is prepared, by chemical or enzymatic 
degradation of the strands and the strands are sorted on a binary 
oligonucleotide array, said binary array comprising an array or 
predetermined areas on a surface of a solid support,, each area 
having therein copies of an binary oligonucleotides of a pre- 
determined sequence, said binary oligonucleotide consisting of a 
constant nucleotide sequence of predetermined length and nucleo- 
tide sequence adjacent to a variable nucleotide sequence. 

174, A method according to claim 173 wherein said binary oligo- 
nucleotide array comprises a 3« array, said imx&obiiised oligo- 
nucleotides consisting of a constant sequence at the 5' terminus 
of a variable sequence, 

17 S, A method according to claim 172 further comprising 

(a) preparing address sets containing a completes list of 

all oligonucleotides contained in a strand or strands in the 

mixture which share an address oligonucleotide for substantially 

every address in the oligonucleotide array on which the partials 

were sorted? and 

{£■) determining whether an address set is a strand set fey 

examining whether the address set can be decomposed into other 

address sets. 

176 x A method according to claim 175 further comprising organic 
ing the oligonucleotides in a strand set into sequence blocks 
composed of oligonucleotides that uniquely overlap each other,, 
and ordering the blocks* 

177 v A -method of obtaining information to order a set of first 
fragments resulting from digestion of DH& with a first restric- 
tion endonuc lease, the nucleotide sequence of said fragments 
having already been determined, comprising 

(a) digesting the DNA with a second restriction endo-~ 
nuclease to generate a set of second, fragments? 
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(b) denaturing the second set of fragments to form a 
mixture of single nucleic acid strands; 

(c) sorting strands an a substantially comprehensive 
o 1 iqonuc 1 so t ids- array ; 

(d) amplifying the strands to generate both their direct 
and complementary copies? 

(e) surveying the contents of individual areas of the array 
on a first binary survey array, said first binary survey array 
comprising an array of predetermined areas on a surface of a 
solid support, each area having therein, eovalently linked to 
said surface, copies of a binary oligonucleotide, said binary 
oligonucleotide having a constant nucleotide sequence which 
contains a sequence complementary to the restriction recognition 
site of the first restriction endonuclease and adjacent to a 
variable sequence; and 

(f) surveying the contents of individual areas of the array 
on a second binary survey array f said second binary survey array 
comprising an array of predetermined areas on a surface of a 

sol. id support, each area having therein, eovalently linked to 
said surface, copies of a second binary oligonucleotide , said 
second binary oligonucleotide having a constant nucleotide 
sequence which contains a sequence complementary to the restric- 
tion recognition site of the second restriction endonuclease and 
adjacent to a variable sequence* 

178 . A method according to claim 177 vherein in step c strands 
ax-e hybridised to an array selected from the group consisting of 

(a) a first binary sorting array, said first binary sorting 
array comprising an array of immobilised oligonucleotides having 
a constant nucleotide sequence complementary to the restriction 
recognition site of the first restriction endonuclease, adjacent 
to a variable sequence of predetermined length, the immobilized 
oligonucleotides in an individual area of the first binary 
sorting array having the same sequence, and 

(h) a second binary sorting array, said second binary- 
sorting array comprising an array of immobilised oligonucleotides 
having a constant nucleotide sequence complementary to the 
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restriction recognition site of the second restriction endo- 
nuciease, adjacent to a variable sequence of predetermined 
length, the immobilized oligonucleotides in an individual area of 
the second binary sorting array .having the sa&s sequence, 

and wherein following hybridisation unhybridiaed and imper- 
fectly hybridized strands are removed. 

179* A method lor obtaining Information to allocate sequenced and 
ordered fragments from an original restriction digest of DNA frost 
sister chromosomes to chromosomal linkage groups comprising 

(a) preparing a partial on an oligonucleotide array from a 
restriction fragment from an alternate restriction digest of the 

which partial spans first and second allelic differences in 
neighboring pairs of sequenced fragments from the original 
restriction digest; and 

(b) determining the presence of oligonucleotides containing 
the first and second allelic differences in a partial which spans 
the first and second allelic differences. 

ISO. A method according to claim 179 wherein 

(a) in step b, the restriction fragment from the alternate 
restriction digest is hybridised to the oligonucleotide array by 
an oligonucleotide containing the first allelic difference; and 

(h) the presence of an oligonucleotide containing the 
second allelic difference is determined by hybridizing the 
partial to a complementary second variable nucleotide sequence in 
an oligonucleotide array and then detecting the presence of the 
partial in the corresponding area of the oligcnuoleoti.de array. 

3.81. A ssethocl for surveying oligonucleotides in a .nucleic acid 
strand comprising 

(a) randomly degrading the strand into pieces, the average 
length of said pieces slightly exceeding the length of oligo- 
nucleotides surveyed ; 

(b) ligating the pieces to a ligating oligonucleotide 
complementary to at least a portion of a constant sequence of 
immobilized oligonucleotides in a binary array; 
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<c) hybridising tha pieces to the binary array, Bald binary 
array having immobilised oligonucleotides in an ordered array 
therein and consisting of a constant sequence adjacent to a 
variable sequence, the immobilised oligonucleotides in an 
individual area of the array having the satae sequence; and 

(d) detecting the hybrids formed. 

182. A method according to claim xsi wherein the array is a 3' 
array having the variable sequence at the 3' termini of the 
immobilised oligonucleotides, further comprising, following step 
(c) t extending the issmobilised oligonucleotides with a polymerase 
using hybridized pieces as templates. 

183 » A method according to claim 182 wherein the strand is a D&A 
strand resulting from a digest with a restriction endonuclease f 
and melting apart of the. fragments obtained thereby or a partial 
obtained from said strand, and wherein the constant sequence 
contains the restriction sndormclease recognition site, 

184 « A method according to claims IB3 wherein dideoxynueleot ides 
are used as substrates during extension of the immobilised 
oligonucleotides using a DNA polymerase. 

185. A method according to claim 181 wherein the ligating oligo- 
nucleotide is pre~hyhridised to the constant immobilised oligo- 
nucleotide prior to ligation to the pieces. 

186. In a primer dependent polymerase reaction for amplification 
of a nucleic acid in which a primer is hybridised to a template 
strand and extended by incubation with a primer dependent poly- 
merase and nucleotide substrates to generate a complementary copy 
of the template strand; the improvement wherein t 

the primer or a part thereof contains one or more primer 
nucleotides that are chemically different from nucleotide sub- 
strates incorporated in the complementary copy of the template 
during the amplification said chemical difference causing the 
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prixser to he cleavahia without, cleaving the part of the com- 
plementary copy generated during amplification* 

187, A method according to claims 186 further wherein "the primer 
is selectively cleaved without cleaving the part of the com- 
plementary copy generated during anplif i cation. 



188* A Bcthod according to 18? wherein the primer or a part 
thereof contains one or more ribonucleotides triphosphates,, and 
the substrates used for amplification are deoxyribonucleoside 
triphosphates ana the primer is cleaved by a chemical or enzy- 
matic reaction which cleaves nucleic strands immediately 3* of 
ribonucleotides .but not 3' of deoxyribonucleotides* 

189, A. method according to claim IBS wherein the chemical reac- 
tion or engymatic reaction is selected from the group consisting 
ox 

(a) alkaline hydrolysis; 

(b) hydrolysis by a magnesium formamide xaixture; and 

(c) ribonuclease digestion* 

19 Gv A method according to claim X&S wherein a ribonucleotide is 
present at the 3 s terminus of the primer. 

151 « A method according to claim IS? wherein said nucleotide 
substrates -used for amplification are modified at their alpha 
phosphate groups so that resulting modified phosphodiester bonds 
in the. complementary copy generated during amplification is 
resistant, to cleavage by a nuclease, said nuclease being chosen 
to be incapable of cleaving said resulting modified phospho- 
diester bonds , further wherein one or more primer phosphodiester 
bonds are not modified to be resistant to said cleavage, and 
wherein said primer is cleaved by treatment with said nuclease. 
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isa. A method according to claim 191. wherein said nucleotide 
substrates modified at their alpha phosphate groups arcs nucleo- 
side al.pha-thiophosphates . 

193. A method according to claiB 191 wherein the nucleotide 

substrates used for amplification are modified deojcy- 
r ibonucleot ides . 

134, An array of oligonucleotide arrays comprising a solid 

sheet having a surface and an array comprising a pattern of 
miniaturized oligonucleotide arrays on said surface, each minia- 
turised array comprising an array of predetermined areas on said 
surface , each area having therein f covalentiy linked to said 
surface., multiple copies of an oligonucleotide of a predetermined 
sequence, 

195. A method according to claim 68 further comprising 

(a) contacting at least one area of said array containing 
the immobilized copies with at least one oligonucleotide probe 
having a predetermined sequence, under conditions promoting 
hybridization of said at least one probe? and 

(to) determining whether or not said at least one probe has 
hybridised to said at least one area, 

196. A raethod according to claim 144 further comprising 

(a) contacting at least one area of said array containing 
the immobilized partial copies with at least one oligonucleotide 
probe having a predetermined sequence, under conditions promoting 
hybridisation of said at least one probe ? and 

(b) determining whether or not said at least one probe has 
hybridised to said at least one area, 

197 „ A method according to claims 170 , wherein determining the 
presence and sequence of all variable oligonucleotides comprises 
(a) contacting said substantially complete set of partials 
with a substantially comprehensive set of oligonucleotide probes,, 
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each of a predetermined length* under conditions promoting 
hybridization of said probes? and 

{b} detsx-Hjining to which partials each said probe has 
hybridised* 
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Unindexed address sets 



ACC 


ACC 




C.*T 


CCT 


CTA 


CTT 


SST 




CTC 


TM 


T.AC 


TCC 


TOO 


TTC 


AT8 


ACC 


ATS 


CAT 


CCT 


CTA 


CTT 


S-CT 


CTA 


STC 


TAA 


TAC 


TCC 


tcsg 


TTS 


CAT 


ACC 


ATS 


CAT 


CCT 




CTT 


SCT 


C3TA 




TM 


TAC 




TCC 


TTC 


CCT 


ACC 


ATS 


CAT 


CCT 


CTA 


CTT 


SCT 


CTA 


CTC 


TAA 


TAC 


TCC 


TCits 


TTS 


CTA 


ACC 






ccr 


CTA 


CTT 


OCT 


CTA 


GTC 




"SAC 


TCC 


TCC 


TTC 


CTT 


ACC 




CAT 


CCT 


CT.A 


CTT 


SCT 


CTA 


CTC 


TM 


TAC 


TCC 


TCC 


TTC 


(SOT 




ATC 


CAT 


CCT 


CTA 


CTT 


SCT 


CTA 


CTC 






TCC 


TCC 


TTC 


«TA 


ACC 


ATQ 






CTA 


CTT 


SST 






VAA 




TCC 


TCC 


TTC 


arc 


ACC 






CCT 


CTA 


CTT 


SST 


CTA 


CTC 




TAC 


TCC 


TCC 


TTS 


TAA 






CAT 








CCT 


CTA 




TAA 


TAC 




TGG 


TTS 


T AC 


ACC 


ATC 






CTA 


CTT 




GTA 


crc 


TAA 




TCC 


TCC 


TTC 


tec 


ACC 


ATC 




CCT 


CTA 


CTT 


SCT 


CTA 


isTC 




TAC 


TCC 


TSS 


TTS 


TGG 


ACC 




CAT 










CTA 




TAA 






TCC 


TTS 


TTO 


ACC 


ATC 


CAT 




CTA 


CTT 


CCT 


STA 


S7C 


TAA 


TAC 


TCC 


TCC 


TTS 



Grouped address sets 



CAT 
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CCT 
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TCC 
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CTT 


SST 


CTA. 


CTC 




TAC 




TCC 


TTC* 


QIC 




ATS 






CTA 




SET 


OTA 


etc 
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TtG 


TCC 


ACC 






CCT 


CTA 


CTT 


SST 


CTA 


CTC 




TAC 


TCC 


TGS 


TTC 


ACC 
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CTA 


CTT 




CTA 


CTC 


TAA 








TTS 


AT 6 


acc 






ccr 


CTA 


CTT 






ore 




TAC 




TCC 


TTG 


CCT 


ACC 


ATS 


CAT 


CCT 


CTA 


CTT 


CCT 


CTA 


STC 




TAC 


TCC 




TTS 


CTT 


ACC 


ATS 


CAT 




CTA 


CTT 


CCT 


CTA 




TAA 




TCC 




TTC 


GCT 


ACC: 


ATC 


CAT 


CCT 


CTA 


CTT 


COT 


OTA 


CTC 






TCC 


TCC 


TTS 


CTA 






CAT 










CTA 






TAC 




TCC 




TAC 


ACC 


ATC 




OCT 


CTA 


CTT 
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STA 


STC 






TCC 


TSX 
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CCT 
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TCC 
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ACC 














STA 
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Identified strand sets 



A: ACC ATC CAT CCT CTT GST CTA TAA TAC TSS TTG 



• CCT CTA CTT SiST <STA CTC 1A*. TAC TCC: TSS T TA- 
CT* CTT CCT ST*. OTC T*> ?A& ICC T3S TT3 

• CCT CTA CTT CCjT ST*. CTC TM TAG TCC TSS 7* 
■ CCT CTA CTT CCT ST* CTC TA.* TAG TCC T&S ?' 

. £TT OCT ST*. Cra *AA TAG TCC TCS T' 

. err est ota «tc taa ?*c tec tss ?• 

. ctt cct stj. a?c ta*. 7*.c tcc tss Tt« 

. CTT C3T fcT* STC 7** T*.£ TCC T«8 TTO 



Figure 9a 

ACC ATG CCT CTA CTT GOT ST A GTC TAC TCC TOG TTO 



ACC 
A TO 
CCT 



vGC AT<< £*7 CCT CT* CTT 

sCC ATO CAT CCT CTA CTT 

kCC ATS CAT CCT CTA CTT 

s,CC *t« CAT CCT CTA CTT 

>cc at a cat ccr sia err 

iCC .ATS CAT CGT CTA CTT 

ACC ATS c*t CCT CT* err 



GST OTA 



i TTC 



OCT ST*. CTC TAA TAG TCC TOO TTC 

c-CT st*. crc iaa Tag tcc tcc ttc 

OCT ST*. CrC TAA TAG TCC ICC TTC 

oor st* q?c taa t*c tcc tss tt& 

GST «TA CTG X*A T.*C TCC TSS 7T-3, 

GS? OTA STC TAA TAC TCC T03 TTC 

GST OTA STC TAA TAC TCC TCC TTC 

OCT OTA STC TM TAC TCC TCC TTC 



ACC . 



C*.T CCT ©T* 



TAC ' 



' TCC TTC 



Figure 9b 
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